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ABSTRACT 

The deaminase-like fold includes, in addition to nu- 
cleic acid/nucleotide deaminases, several catalytic 
domains such as the JAB domain, and others 
involved in nucleotide and ADP-ribose metabolism. 
Using sensitive sequence and structural compari- 
son methods, we develop a comprehensive natural 
classification of the deaminase-like fold and show 
that its ancestral version was likely to operate on 
nucleotides or nucleic acids. Consequently, we pre- 
sent evidence that a specific group of JAB domains 
are likely to possess a DNA repair function, distinct 
from the previously known deubiquitinating peptid- 
ase activity. We also identified numerous previously 
unknown clades of nucleic acid deaminases. Using 
inference based on contextual information, we sug- 
gest that most of these clades are toxin domains of 
two distinct classes of bacterial toxin systems, 
namely polymorphic toxins implicated in bacterial 
interstrain competition and those that target dis- 
tantly related cells. Genome context information 
suggests that these toxins might be delivered via 
diverse secretory systems, such as Type V, Type 
VI, PVC and a novel PrsW-like intramembrane 
peptidase-dependent mechanism. We propose that 
certain deaminase toxins might be deployed by 
diverse extracellular and intracellular pathogens as 
also endosymbionts as effectors targeting nucleic 
acids of host cells. Our analysis suggests that 
these toxin deaminases have been acquired by 
eukaryotes on several independent occasions 
and recruited as organellar or nucleo-cytoplasmic 
RNA modifiers, operating on tRNAs, mRNAs and 
short non-coding RNAs, and also as mutators of 
hyper-variable genes, viruses and selfish elements. 



This scenario potentially explains the origin of mu- 
tagenic AID/APOBEC-like deaminases, including 
novel versions from Caenorhabditis, Nematostella 
and diverse algae and a large class of fast-evolving 
fungal deaminases. These observations greatly 
expand the distribution of possible unidentified mu- 
tagenic processes catalyzed by nucleic acid 
deaminases. 

INTRODUCTION 

Enzymes of the deaminase superfamily catalyze deamin- 
ations of bases in nucleotides and nucleic acids across in 
diverse biological contexts (1). Representatives that act on 
free nucleotides or bases, such as the cytidine deaminases 
(CDD/CDA), deoxycytidylate monophosphate deamin- 
ases (dCMP), and guanine deaminase (GuaD) are primar- 
ily involved in the salvage of pyrimidines and purines, or 
in their catabolism in bacteria, eukaryotes and phages (2). 
Certain derived versions of these enzymes, such as the 
Blasticidin S deaminase and the RibD deaminase, have 
been recruited for deamination events in the biosynthesis 
of modified nucleotides (that might be incorporated into 
antibiotics like Blasticidin S) or cofactors (3,4). In 
contrast, other members of the deaminase superfamily 
catalyze the in situ deamination of bases in both RNA 
and DNA. Such modifications play a central role in 
RNA editing, which is critical for generating the appro- 
priate anti-codon sequences for decoding the genetic code, 
modification of the sequences of microRNA and other 
transcripts and alteration of the reading frames in 
mRNAs, defense against viruses via hypermutation-based 
inactivation, and somatic hypermutation or class switch- 
ing of antigen receptor genes in vertebrates (1,5-8). In 
addition to the deaminase superfamily, deamination of 
standalone bases is also catalyzed by structurally unre- 
lated amidohydrolases that display other protein folds, 
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such as the CodA-like cytosine deaminases and Amdl-like 
AMP deaminases with a TIM Barrel fold (9) and 
Escherichia coli Dcd-like dCTP deaminases with the 
dUTPase fold (10). However, currently, only members of 
the deaminase superfamily have been implicated in in situ 
nucleic acid modifications leading to RNA editing or 
DNA hypermutation, and are accordingly termed 
nucleic acid deaminases. 

Of these nucleic acid deaminases, the tRNA adenosine 
deaminases, Tad2/TadA comprise the most widespread 
clade, and are found across bacteria and eukaryotes. 
They catalyze the deamination of adenosine to inosine at 
the wobble position of the anti-codon of particular 
tRNAs, which is critical for degenerate codon decoding 
during translation (1 1,12). In trypanosomes, these enzymes 
have also been shown to catalyze cytosine to uracil de- 
amination in ssDNA; however, the biological significance 
of this modification remains poorly understood (13). All 
other clades of nucleic acid deaminases show more re- 
stricted or sporadic phyletic patterns (14). The eukaryotic 
tRNA deaminase, Tadl is involved in conversion of A to I 
at position 37 of tRNA Ala , required to stabilize codon- 
anti-codon interactions (15). Its metazoan-specific paralog, 
the adenosine deaminase ADAR is involved in the inacti- 
vation of RNA viruses by hypermutation, and in editing 
of diverse mRNAs, siRNAs and miRNA precursors (16). 
The activation-induced deaminase (AID) and some of its 
close relatives have been implicated in DNA deamination 
in the mutagenic diversification of antibodies and variable 
lymphocyte receptors of gnathostomes and agnathans 
(8,17). Additionally, DNA repair in response to their mu- 
tagenic action might play a role in the demethylation of 
5-methylcytosine in vertebrate DNA (18,19). AID belongs 
to a vertebrate-specific radiation of nucleic acid deamin- 
ases, which includes the poorly characterized APOBEC2 
and APOBEC4, and others such as the mammalian 
APOBEC1 implicated in mRNA editing, and the various 
tetrapod-specific APOBEC3 paralogs involved in inactiva- 
tion of retroviruses, hepadnaviruses and retro elements via 
hypermutation of its nascent template DNA (8,17,20,21). 
A distinct clade of nucleic deaminases, prototyped by the 
plant PPR DYW domains, has only been reported in land 
plants and in the amoeboflagellate Naegleria (6,22). The 
characterized DYW-type deaminases are implicated in 
chloroplast and mitochondrial transcript maturation via 
numerous C to U editing events. The recently 
characterized CDAT8 deaminase, which catalyzes a C to 
U modification at the acceptor stem hairpin in tRNAs, is 
currently only detected in the archaeon Methanopyrus 
kandleri (23). 

Sporadic distribution of nucleic acid deaminases and 
their rapid evolution due to positive selection often con- 
founds the interrelationships between the various families 
in standard phylogenetic analyses. While some aspects of 
the overall relationships have been identified by previous 
structural comparisons (8,17), the sudden emergence of 
these distinct families remains an unsolved mystery. 
Recently, we identified a large and diverse array of 
deaminase superfamily domains in a novel class of bacter- 
ial toxin systems (24). These toxin systems, of which the 
proteobacterial contact-dependent growth inhibition 



(CDI) system is an experimentally characterized proto- 
type, are implicated in intraspecific competition and pos- 
sibly kin recognition (24-26). In these systems, the toxin 
module is usually at the C-terminus of a multidomain 
protein that is secreted or attached to the cell surface. 
Upon contact with another cell, the toxin module is de- 
livered to the recipient cell and its toxicity depends on the 
catalytic activity of the toxin domain (24,25). The toxin 
modules in these systems are highly variable and typically 
contain nuclease domains belonging to distinct protein 
folds (e.g. HNH/ENDOVII, EndoU, restriction endo- 
nuclease and cytotoxin RNAse) that cleave DNA or dif- 
ferent RNAs in the target cells (24,25). The deaminase 
domains are the other major class of toxin domains; 
even as the nuclease toxins, they are predicted to target 
nucleic acids in target cells. Preliminary analyses suggested 
that these toxin deaminase domains might provide new 
leads regarding the origins of the more sporadically dis- 
tributed nucleic deaminase domains (24,27). In addition, 
the very origin of the deaminase superfamily, with its pre- 
dominantly bacterial and eukaryotic phyletic pattern, is 
also mysterious. Structural comparisons have suggested 
that the deaminase domain shares a distinct a + p fold 
(the deaminase-like fold) with other superfamilies of pro- 
teins such as the JAB domain (28), the aminoimidazole- 
4-carboxamide ribonucleotide (AICAR) transformylase 
domain of PurH (a purine biosynthesis enzyme) (29), 
and the formate dehydrogenase accessory subunit 
(E. coli FdhD) (30) (see SCOP database). While displaying 
a deaminase-like fold, the latter superfamilies do not con- 
tain the characteristic active site residues of the deaminase 
superfamily, although of these, the JAB domain coordin- 
ates a metal ion in a similar position. 

Hence, in this study we sought to integrate sequence 
and structure analysis along with different sources of con- 
textual information from gene neighborhood and domain 
architectures to address questions pertaining to the origin, 
higher order relationships and evolution of the deaminase 
superfamily. In particular, we wanted to understand the 
emergence of the deaminase catalytic active site and its 
relationship to the substrate-binding sites of other non- 
deaminase members of the fold, apropos their evolution- 
ary history. As a result, we identified a recurrent theme in 
the different superfamilies of the deaminase-like fold, 
namely, the conservation of a spatially similar substrate- 
binding pocket, in spite of the difference in the locations of 
the actual residues that bind substrates or mediate cataly- 
sis. We also show that the deaminase superfamily had a 
primarily bacterial origin, though the deaminase-like fold 
itself might be traceable to the last universal common 
ancestor (LUCA). A major radiation of the deaminase 
superfamily happened in the context of bacterial toxin 
systems resulting in at least nine distinct clades. We 
further show that the origins of most major sporadically 
distributed lineages of eukaryotic nucleic acid deaminases 
involved in organellar RNA editing, DNA hypermutation 
and anti-viral defense can be traced back to bacterial toxin 
deaminases. This analysis also helped us predict several 
novel eukaryotic deaminases, suggesting that editing, 
hypermutation and defensive deployment of deaminases 
might be more widespread than was previously known. 
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METHODS 

Iterative sequence profile searches were performed using 
the PSI-BLAST (31) and JACKHMMER (32) programs 
run against the non-redundant (NR) protein database of 
National Center for Biotechnology Information (NCBI). 
Similarity-based clustering for both classification and 
culling of nearly identical sequences was performed using 
the BLASTCLUST program (ftp://ftp.ncbi.nih.gov/blast/ 
documents/blastclust.html). The HHpred program was 
used for profile-profile comparisons (33). Structure simi- 
larity searches were performed using the DaliLite program 
(34). Multiple sequence alignments were built by the 
Kalign (35) and PCMA (36) programs, followed by 
manual adjustments on the basis of profile-profile and 
structural alignments. Secondary structures were predicted 
using the JPred (37) and PSIPred (38) programs. For pre- 
viously known domains, the Pfam database (39) was used 
as a guide, though the profiles were augmented by addition 
of newly detected divergent members that were not present 
in the original Pfam models. Clustering with 
BLASTCLUST followed by multiple sequence alignment 
and further sequence profile searches were used to identify 
other domains that were not present in the Pfam database. 
Signal peptides and transmembrane segments were de- 
tected using the TMHMM (40) and Phobius (41) 
programs. Contextual information from prokaryotic gene 
neighborhoods was retrieved by a custom PERL script 
that extracts the upstream and downstream genes of the 
query gene and uses BLASTCLUST to cluster the proteins 
to identify conserved gene neighborhoods. Phylogenetic 
analysis was conducted using an approximately maximum- 
likelihood method implemented in the FastTree 2.1 
program under default parameters (42). Structural visual- 
ization and manipulations were performed using the 
VMD (43) and PyMol (http://www.pymol.org) 
programs. The in-house TASS package, which comprises 
a collection of Perl scripts, was used to automate aspects 
of large-scale analysis of sequences, structures and genome 
context (Anantharaman,V., Balaji,S. and Aravind,L., un- 
published data). 

RESULTS AND DISCUSSION 

Analysis of the deaminase-like fold 

Identification of a conserved substrate-binding pocket in the 
deaminase-like fold. Both structural searches using DALI 
and the SCOP database identify five major sequence super- 
families within the deaminase fold (Figures 1 and 2). These 
include the deaminases, the JAB domain, the penultimate 
and C-terminal domains responsible for AICAR 
formylation in the bifunctional PurH protein, the 
C-terminal domain of the formate dehydrogenase acces- 
sory subunit (E. coli FdhD) and an uncharacterized family 
prototyped by Thermotoga maritima TM1506 (Pfam 
DUF1893) (44). The core of the deaminase fold contains 
a sheet of four strands in the 2134 order with strand- 1 
anti-parallel to the remaining strands of the sheet 
(Figures 1 and 2). The first two strands form a hairpin 
and are preceded by an oc-helix (Helix- 1). This is 



followed by another a-helix (Helix-2) and the remaining 
two strands are separated by a third a-helix (Helix-3). 
Additionally, the fold also contains a highly variably pos- 
itioned fifth strand that can stack either parallel or anti- 
parallel to strand-4. In the cytidine deaminases CDA/ 
CDD clade of deaminases and JAB domains, strand-5 
forms a hairpin with strand-4 and is thereby anti-parallel 
to it, whereas, in all the remaining deaminase families and 
non-deaminase lineages, an a-helix (Helix-4) separates 
strands 4 and 5, resulting in strand-5 stacking parallel to 
strand-4 (17) (Figures 1 and 2). Further, the AICAR 
transformylase domain and the deaminase-fold domain 
in FdhD share an extra strand that stacks in an anti- 
parallel orientation to strand-5. In the AICAR trans- 
formylases, this strand is circularly permuted to the 
N-terminus of the deaminase-like fold. 

An analysis of available crystal structures and 
conserved residues of well-characterized enzymatic 
families provides us a glimpse of the distribution of the 
substrate binding and catalytic residues across members of 
this fold. Both cytidine deaminases and JAB domains co- 
ordinate a zinc ion lodged in a structurally similar location 
between helices-2 and -3 of the core fold (Figure 1). The 
zinc ion plays a comparable role in the deaminase or pep- 
tidase reaction, by activating a water molecule, which 
forms a tetrahedral intermediate with the carbon atom 
that is linked to the amine group. This is followed by de- 
amination of a base in deaminases, or peptide hydrolysis 
in JAB domain metallopeptidases (12,28). However, the 
type and spatial position of the residues that coordinate 
the zinc ion differ greatly between the two superfamilies. 
In the deaminase superfamily, the zinc ion is coordinated 
by a histidine (or cysteine) in the N-terminus of helix-2, a 
pair of cysteines in the first turn of helix-3 and a water 
molecule. An acidic residue, present two positions 
C-terminal to the helix-2 histidine (cysteine) serves as a 
general proton acceptor/donor during the reaction. In 
contrast, the zinc ion in the JAB domain is coordinated 
by a pair of histidine residues (HxH motif) at the end of 
strand-3, an aspartate residue in helix-3 and a water mol- 
ecule. In these proteins, a glutamate in strand- 1 serves in 
proton-transfer reactions, and a serine residue in helix-2, 
stabilizes the tetrahedral intermediate of the reaction 
(Figure 1) (28). The AICAR transformylase domain of 
the bifunctional PurH enzyme catalyzes transfer of a 
formyl group from N-10-formyl-tetrahydrofolate to 
AICAR to produce 5-formyl-AICAR (FAICAR). 
FAICAR is then cyclized to inosine monophosphate 
(IMP) by an N-terminal IMP cyclohydrolase domain 
(45). The C-terminal transformylase region of this protein 
is comprised of two tandem domains displaying the 
deaminase-like fold that further dimerize. As a result, 
the N-terminal deaminase-like fold of one monomer forms 
a tail-to-head interaction with the C-terminal deaminase- 
like fold of the other. The active site is formed at the 
dimeric interface of the two monomer units and involves 
absolutely conserved lysine and histidine residues (KH 
motif) that form an acid-base pair and are present in the 
loop between strands 1 and 2 of the N-terminal unit (tail), 
and a highly conserved phenylalanine that functions as a 
pi hydrogen bond acceptor and is present in the extended 
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JAB (PDB: 2ZNV) AICAR transformylase (PDB: 1M9N) 



' ' ■'' 

Tm1506 (PDB: 1VK9) FdhD (PDB: 2PW9) 

Figure 1, Representative structures of the deaminase fold. All structural cartoons are shown in an approximately similar orientation. The oc-helices 
are colored purple, (3-sheets yellow and loops gray. The predicted and known active site residues and substrates and ligands (if known) are labeled. 
The (3-strand which adopts different orientations in the two major deaminase divisions is shown in dark green. Surface diagrams are colored based on 
their positions relative to the center of the structure (outside to inside: blue to red) to illustrate the binding cleft. For the JAB domain, only the 
relevant portion of the dimeric Ub-substrate that interacts with the active site is rendered. Similarly, for the AICAR transformylase only the region 
of the B chain (the other change of the dimeric unit) that interacts with the active site pocket is rendered. 





region between strand-3 and helix-3 of the C-terminal unit 
(head) (45) (Figure 1). Other conserved residues binding 
the substrate emerge from the C-terminal unit, and include 
a highly conserved asparagine at the N-terminus of 
strand- 1 and an arginine at the beginning of helix-2 
(Supplementary Data). Thus, the substrate binding or cat- 
alytic residues vary greatly among the well-characterized 
superfamilies of the deaminase-like fold (Figure 1). 

A comparison of the substrate-binding surfaces of the 
C-terminal deaminase-like fold domain of the AICAR 
transformylases and of the various deaminases co- 
crystallized with their substrates reveals the presence, in 
all instances, of a pocket which binds either a nucleotide, a 
base or its derivative, walled by the loop between helix- 1 
and strand- 1, the loop between strand-2 and helix-2 and 
an extended loop between strand-3 and helix-3 (Figure 1). 



In the JAB domain (e.g. PDB: 2znv), the lysine residue of 
the ubiquitinated substrate binds the same pocket, close 
to the Zn + ion-binding region, and the ubiquitin tail lies 
along a groove between helices 2 and 3 of the JAB 
deubiquitinase (46). Although the substrate-binding 
region for FdhD is yet to be determined, an examination 
of its surface structure reveals a similarly positioned 
binding pocket (Figure 1). In FdhD, the pocket is com- 
prised of, or surrounded by, the most conserved residues 
of the superfamily, a highly conserved histidine at the be- 
ginning of strand- 1 , and two arginine residues, at the be- 
ginning of helix-2 and strand-3, respectively, suggesting a 
role for them in substrate binding or enzyme catalysis 
(Figure 1 and Supplementary Data). The crystal structure 
of the uncharacterized TM1506 protein (44) reveals an 
unknown ligand in the same pocket. This ligand spatially 
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Figure 2. Reconstructed evolutionary history for the deaminase fold and key structural features. On the left is a reconstructed evolutionary history 
of the deaminase fold. Individual lineages are listed to the right and grouped according to the classification given in the text and Table 1. The 
inferred evolutionary depth of the lineages is traced by solid horizontal lines across the relative temporal epochs representing major evolutionary 
transitional periods shown as vertical lines. Horizontal lines are colored according to their observed phyletic distributions; the key for this coloring 
scheme is given at the bottom right of the figure. Dashes indicate uncertainty in terms of the origins of a lineage, while gray ellipses group lineages of 
relatively restricted phyletic distribution with more broadly distributed lineages, indicating that the former likely underwent rapid divergence from the 
latter. Known and predicted functions of the deaminases are shown next to the clade names. On the right are topologies of the two major divisions 
within the deaminase superfamily. Insert positions characteristic of various deaminase lineages are marked in both the evolutionary history and 
topology diagrams. The P-strands and a-helices of the conserved deaminase core are colored yellow and orange respectively. Additional structural 
elements are colored dark green. Refer to the key for coloring schemes and abbreviations. Additionally, Fu: fungi, PI: Plants, Na: Naegleria, Oo: 
Oomycetes. 



contacts a highly conserved polar residue (usually lysine), 
which is present at the end of strand-3. Although the 
identity of the TM1506 ligand is not known, it has been 
shown to be ADP ribosylated at asparate-56 (44). The 
diffraction density of the ligand in the crystal structure 
indicates a relatively low molecular weight solute that is 
likely to be ADP ribose itself or its precursor NAD. 
Contextual analysis of the TM 1 506-like genes shows 



that in firmicutes and bacteroidetes, they are linked in 
predicted operons to genes encoding a Rossmann fold 
aldo/keto reductase fused to a rubredoxin-like zinc 
ribbon and a 5TM protein that is predicted to form a 
channel (Supplementary Data). In bacteroidetes, the 
TM1506 domain is also fused to a TonB-like receptor, 
which is usually involved in the trafficking of small mol- 
ecules such as siderophores and peptide antibiotics (47). 
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These contextual associations, together with structural 
evidence, suggest that TM1506 is likely to bind NAD or 
ADP ribose, either to sense redox states by means of the 
bound ligand or to function as a regulatory ADP ribo- 
syltransferase. Thus, rather than being a RNA-binding 
protein, as originally proposed (44), it is likely to control 
transport across the membrane by either regulating redox 
potential or modification of substrates. 

In summary, while the positions of the actual residues 
involved in substrate interaction or catalysis show great 
variation between the five superfamilies of the deaminase- 
like fold, the location of the bound substrate and the cor- 
responding substrate-binding pocket are well-conserved 
across all representatives. This suggests that the common 
ancestor of all superfamilies of the deaminase-like 
fold possessed an equivalent ligand-binding pocket. 
The presence of this binding pocket appears to have 
served as a constraint that restricted the evolution of sub- 
strate interaction and catalytic residues to a limited set of 
positions. Some of these appear to have been repeatedly 
favored, such as the residues between the end of strand-2 
to the beginning of the helix-2, and the region between the 
end of strand-3 and the beginning of helix-3. Thus, the 
deaminase-like fold appears to represent a favorable 
scaffold that has allowed the exploration of a diverse set 
of alternatives in both substrate and chemical reaction 
space (48). 

Inference of nucleic acid or nucleotide-related functions for 
the ancestral deaminase-like fold domain. Analysis of the 
phyletic patterns of the various domain superfamilies 
adopting the deaminase-like fold revealed that the JAB 
domain alone has a widespread distribution in all the 
three superkingdoms of life: the proteasomal lid complex 
JAB domain metallopeptidases are universally conserved 
across eukaryotes and related versions are also present in 
practically all the major archaeal lineages. Similarly, the 
RadC-type JAB domains are present across most major 
bacterial lineages (Figure 2, Supplementary Data). This 
suggests that the JAB domain was likely to have been 
present in the LUCA. The deaminase, FdhD and 
AICAR transformylase superfamilies are present in most 
bacterial lineages (Figure 2 and Supplementary Data). The 
deaminase superfamily is infrequently found in archaea, 
but is present across all eukaryotes. Outside bacteria, the 
FdhD superfamily is sporadically present in archaea, 
while the AICAR transformylase superfamily is limited 
to a few eukaryotic lineages. The TM1506-like proteins 
are found in a restricted set of bacterial lineages, the 
firmicutes, bacteroidetes, actinobacteria, spirochaetes 
and Thermotoga (Figure 2 and Supplementary Data). 
Together, these phyletic patterns suggest that, other than 
the more ancient JAB domain, the remaining 
deaminase-like fold superfamilies originated in bacteria 
and were laterally transferred on different occasions to 
archaea and eukaryotes. Of these superfamilies, the 
deaminases and AICAR transformylase superfamily 
bind nucleotides, bases or related molecules (like 
AICAR). While the ligand of TM1506 remains un- 
characterized, as noted above, the available evidence 
favors a nucleotide or a related molecule (NAD or ADP 



ribose). Structural analysis and certain shared features, 
such as the presence of a sixth-strand packing with 
strand-5 (Figure 1), and lack of a catalytic metal, also 
indicate that the AICAR transformylase and the FdhD 
superfamilies share an exclusive common ancestor 
among the deaminase-like folds domains. Further, given 
the role of AICAR transformylase in formyl transfer to a 
nucleotide precursor (45), it is conceivable that the related 
FdhD might bind a nucleotide or related molecule 
allosterically to regulate the formate dehydrogenase cata- 
lytic subunit. Thus, binding of a nucleotide or a related 
molecule appears to be a potential shared function across 
versions of the deaminase-like fold that originated in 
bacteria. 

However, the characterized JAB domains appear to 
depart from this pattern by displaying peptidase activity, 
specifically in the context of the C-termini of ubiquitin-like 
proteins (UBLs). Such peptidase activity has been 
demonstrated or reliably inferred for JAB domains func- 
tionally associated with UBLs in the eukaryotic and pro- 
karyotic Ub systems and related evolutionarily mobile 
prokaryotic systems involved in cysteine and siderophore 
biosynthesis (49-53). However, analysis of genome 
contexts points to another previously unknown function 
of a large group of JAB domains typified by the E. coli 
RadC protein. While certain early genetic studies 
implicated RadC in DNA repair, there has been much 
uncertainty about its role in this regard (54,55). We 
observed that across several major bacteria lineages, the 
JAB domain of RadC is fused to an N-terminal 
Helix-hairpin-Helix domain (HhH) that is often found in 
proteins involved in DNA replication and repair 
(Supplementary Data) (56). In various firmicutes and 
fusobacteria, a version of RadC (e.g. gi: 257462804), is 
fused to the anti-restriction ArdC module, which we 
established to be comprised of two domains, an 
N-terminal a-helical domain and a C-terminal zincin-like 
metallopeptidase domain (Supplementary Data). This 
module has been shown to bind single-stranded DNA 
(57) and probably blocks the action of REases of restric- 
tion-modification systems, via cleavage by the zincin-like 
domain. A related version of RadC in fusobacteria (e.g. 
Fusobacterium nucleatum FNP_1834, gi: 254304164) is 
fused to a DNAG-like primase domain with an 
N-terminal DNA-binding Zn-ribbon (Supplementary 
Data). Finally, RadC-like domains are fused to a DinG/ 
RAD3-like superfamily II helicase in spirochaetes, 
deltaproteobacteria, planctomycetes, fusobacteria and 
firmicutes (Supplementary Data). In certain fusobacteria, 
the zinc ion coordinating residues of the RadC-type JAB 
domain appear to have been lost, suggesting that these 
may be functionally inactive. Interestingly, the DinG/ 
RAD3-like helicases with RadC-type JAB domains are 
closely related to versions that are fused to a 3—5' exo- 
nuclease domain of the RNaseH fold in place of the JAB 
domain (Supplementary Data). The above contextual as- 
sociations strongly support a role for the RadC-type JAB 
domain in DNA repair. Non-homologous domain dis- 
placements involving functionally similar but structurally 
unrelated domains have been previously reported in 
several DNA-modifying enzymes in prokaryotes (58,59). 
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The comparable fusions of closely related DinG/Rad3-like 
helicases to either JABs or a 3—5' exonuclease imply that, 
by the principle of non-homologous domain displacement, 
the JAB might function as a nuclease. In instances where it 
is inactive, it may instead be a DNA-binding domain. In 
diverse methanogenic and halophilic archaea, a gene 
encoding a distinct archaeal clade of JAB domains is 
strongly associated in a predicted operon with a gene 
encoding a nucleotidyltransferase of the HIGH superfam- 
ily (Supplementary Datal). This suggests that at least a 
subset of archaeal JAB domains might functionally 
interact with nucleotides. Thus, the primary bacterial 
clade of JAB domains (RadC) and certain archaeal JAB 
domains appears to function in the context of nucleic acids 
or nucleotides, not unlike most of the other superfamilies 
of the deaminase-like fold. Based on the above observa- 
tions, one could reasonably infer that the ancestral version 
of the deaminase-like fold bound a nucleotide or a related 
molecule. 

As per the above inferences, the acquisition of peptidase 
activity was a secondary event in the evolution of the JAB 
superfamily. Unlike other peptidase superfamilies, which 
can act on a variety of peptide substrates, the peptidase 
activity of the JAB superfamily appears to be restricted 
solely to the UBL tail regions (46). This is consistent with 
the observation that the substrate-binding pocket of most 
JAB domains is eminently suited to bind a long narrow 
substrate like a single-stranded nucleic acid or a peptide 
strictly in the extended conformation (Figure 1). Thus, the 
substrate-binding pocket of the JAB domain is unlikely to 
be suitable as a general peptidase active site. Hence, it was 
probably recruited for such an activity only by virtue of its 
specific ability to recognize the distinctive extended con- 
formation of UBL tail regions with their characteristic 
small residues. Given the inference of the JAB domain 
in LUCA and the close relationship between them and 
the deaminases in terms of a similarly bound, shared 
metal and catalytic chemistry, it is possible that the 
deaminases emerged from a JAB domain-like precursor 
in bacteria. This precursor is likely to have catalyzed the 
metal-dependent deamination of either free bases or 
nucleic acids. However, in light of the known (peptidase) 
and predicted (nuclease) hydrolytic activities of the JAB 
domain, it would be of interest to investigate if any of the 
members of deaminase superfamily might possess nuclease 
activity. The three remaining superfamilies are also likely 
to have emerged from such a precursor, through loss of 
the metal-binding site but retention of the ability to 
interact with a base or nucleotide-related substrate. This 
also suggests that the additional helix found between 
strands 4 and 5 in several versions of the deaminase-like 
fold, emerged on two independent occasions, once within 
the deaminase superfamily and a second time in the pre- 
cursor of all the metal-free superfamilies. 

Higher order classification and unique structural features 
of the deaminase superfamily 

Analysis of previously known members of the deaminase 
superfamily reveals two major divisions. Based on available 
structures, multiple sequence alignments and secondary 



structure predictions, the deaminase superfamily can be 
divided into two major divisions (Figures 2 and 3, Table 
1): (i) The C-terminal hairpin division is the first major 
deaminase division, in which strands 4 and 5 are 
anti-parallel to each other. Members of this division in- 
clude the CDD/CDA-like cytidine deaminases, Blasticidin 
S-deaminases, the DYW deaminases implicated in plant 
organellar RNA editing and plant Des/Cda deaminases 
(e.g. Arabidopsis DesA) with two deaminase domains of 
which only the N-terminal version is active. While 
members of this division most commonly have a cysteine 
in helix-2 as part of the CxE signature, some clades, such 
as the DYW deaminase, instead, have a HxE signature 
(Figures 2 and 3). Within this clade, the CDD/CDA 
deaminases, the plant Des/Cda deaminases and the 
Blasticidin S-deaminases form a monophyletic group 
and share several sequence synapomorphies (Figure 2 
and Table 1). (ii) The second major division of the 
deaminase superfamily is the Helix-4 division, in which 
the intervening helix-4 causes strands 4 and 5 to be 
parallel to each other (Figures 1 and 2; Table 1). This 
division includes the tRNA deaminases Tad2/TadA and 
its eukaryotic paralog Tad3, the tRNA deaminase Tadl 
and its metazoan paralog ADAR, the Methanopyrus 
tRNA editing deaminase, the dCMP deaminases 
(including the ComE-P2 clade of deaminases), the 
guanine deaminase GuaD, the RibD-like deaminase and 
the AID/APOBEC deaminases (Table 1 and Figure 2). 
These proteins are typified by a HxE signature in helix-2 
(Figure 3). 

Apart from the zinc-binding residues that are highly 
conserved across the fold, most deaminase clades can be 
distinguished by their unique lineage-specific sequence and 
structural features (Figure 2 and Table 1). A mapping of 
these on the structure of the deaminase-like fold shows 
that in most instances, these lineage-specific features 
form part of the substrate-binding pocket or are 
associated with it, and either they bind to or are predicted 
to bind to their substrates (Figure 2). For example, in 
Tadl -like deaminases, the lineage-specific residues 
include a conserved aspartate N-terminal to helix-2, two 
arginines in helix-2 and a lysine in helix-3 that project into 
the substrate-binding pocket. Further, an insert between 
strand-2 and helix-2 and a large three stranded insert in 
the CxxC motif form caps over the structural-binding 
pocket (Figure 3). Although these inserts are present 
throughout the Tadl family, their sequence is not 
strongly conserved. Hence, rather than contributing 
directly to the active site, these inserts might form struc- 
turally mobile caps that either regulate substrate or 
solvent access to the active site. In dCMP deaminases, a 
comparable insert, which is supported by a distinct 
zinc-binding site, is present between strand-2 and helix-2 
(just upstream of the HxE motif; Figure 3). This insert 
also forms a cap over the active site and restricts access 
to the active site to just a soluble base. In the Tad2 family, 
additional C-terminal helices are present beyond the core 
fold. The first of these by means of a conserved phenyl- 
alanine residue (F144 in PDB 2b3j) contacts the base 
present at the +1 position (C35 in PDB 2b3j) with 
respect to the modified adenine in the tRNA substrate. 
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Studies on the structure of the Tad2-family and sequence 
preferences in AID-APOBEC deaminases show that an 
extended loop between helix-4 and strand-4 is a key deter- 
minant of the target motif by selectively interacting with 
bases at the —1 and —2 positions with respect to the 
modified base (60). A comparison of the APOBEC2 and 
APOBEC3 structures with that of the substrate-bound 
TadA deaminase points to the potential importance of 
multiple structural features in choice of the target motif. 
The larger extended insert between strand-4 and helix-4 
contributes to notably reducing the aperture of the 
substrate-binding pocket in AID/APOBEC deaminases. 
This aspect, with a highly conserved tyrosine in the same 
loop, which could participate in base-stacking inter- 
actions, might be responsible for the cytosine specificity 
of these deaminases, as opposed to the adenine specificity 
of the related Tad2/TadA deaminases. Further, in the 
AID/APOBEC deaminases, the characteristic extended 
loop between helix- 1 and strand- 1 is likely to be respon- 
sible for determining the base at the +1 position (Fig. 2). 
The DYW clade of deaminases (no structure is yet avail- 
able), display a highly conserved lysine between helix- 1 
and strand- 1 and a basic residue after the HxE motif 
(Figure 3). Its predicted location, based on comparisons 
with known structures, suggests that these residues are 
likely to be critical for interaction with RNA. The DYW 
clade also contains an insert between strand-2 and helix-2, 
which could form a cap over the substrate binding pocket 
and possibly that interacts with the substrate via a highly 
conserved arginine present in it (Figure 2). Yet another 
feature restricted to the plant, Naegleria, rotifer and a 
single fungal version from Laccaria is a distinct second 
metal-binding site formed by a pair of conserved histidines 
and cysteines. Our analysis suggests that members of the 
DYW clade possess all features of other catalytically 
active deaminases consistent with their implied role in 
the numerous C to U deaminations in plant organelles. 
However, a recent study has claimed that some of them 
might be endoRNAses (61), but remains unclear if the 
reported observed nuclease activity is directly catalyzed 
by the deaminase domain or might be a secondary conse- 
quence triggered by the base deamination. The above 
analysis suggested that a key feature in the evolution of 
the deaminase superfamily is the emergence of lineage- 
specific inserts and conserved residues that have helped 
in adapting the shared active site and binding pocket to 
recognize different substrates. 

Detection of novel members of the deaminase superfamily 
through sequence analysis. Given that the deaminase 
superfamily spans an extraordinary diversity in sequence 
space and sensitive, an exhaustive sequence analysis is 
required to comprehensively identify its members. This 
was further underscored by our recovery of novel 
members of the deaminase superfamily among the bacter- 
ial polymorphic toxins (24). These deaminases showed a 
much greater range of sequence divergence than that en- 
countered among the previously known members. They 
also pointed to the presence of unusual sequence features, 
such as the presence of a DxE signature in place of the 
usual CxE or HxE in the metal-chelating motif at the 



beginning of helix-2 (Figure 3). These observations 
prompted us to carry out a systematic search for deamin- 
ases using iterative sequence profile search methods as im- 
plemented in the PSI-BLAST and JACKHMMER 
programs and profile-profile comparisons as implemented 
in HHpred. Profile searches were also initiated with align- 
ments of various subfamilies using the HMMSEARCH 
program. 

Seeds for these searches included the well-characterized 
versions of the superfamily, as well as representatives 
of the recently discovered toxin deaminases. Novel 
deaminase domains recovered in these searches were 
then used as queries for transitive searches to further 
expand the horizon of detected members. A systematic 
analysis was also performed on proteins that potentially 
contain deaminase-like metal-binding motifs in high- 
scoring segment pairs (hsp), but were recovered below 
the significance threshold in iterative profile searches. 
These were subject to profile-profile comparisons to 
confirm their inclusion in the deaminase superfamily. 
For example, PSI-BLAST searches with the N-terminal 
deaminase domain of human APOBEC3D (gi: 
22907041) as a query retrieved several distinct bacterial 
deaminases at significant e-values starting from the fourth 
iteration. Most of these bacterial deaminases were identi- 
fied as the toxin domain of polymorphic toxins (see 
below), and contain a DxE motif in place of the CxE/ 
HxE motif in helix-2 (e.g. Burkholderia pseudomallei 
BURPS668J122 gi: 126439023, iteration 8, e = 10 -5 ). 
However, at borderline e-values, this search also recovered 
two further deaminases. One of them, the Streptomyces 
coelicolor SC4A7.11 (gi: 21220850; recovered in iteration 
12; e = 0.05) protein, is fused to a RicinB-like lectin 
domain. This protein contained a CxE motif in helix-2 
along with the CxxC motif in helix-4 (Figure 3). The 
second deaminase domain recovered was at the 
C-terminus of a gigantic protein from a prophage 
WOCauB3, integrated into the genome of Wolbachia, an 
endosymbiont of Cadre cautella (B3gp45 protein, gi: 
222825157; iteration 12; e = 0.1) (62). 

Transitive searches initiated with the DxE-motif- 
containing versions recovered novel deaminase domains 
from proteobacteria, firmicutes, actinobacteria, cyanobac- 
teria, chlorobium and the eukaryotic intracellular parasite 
Perkinsus. Some of these searches also recovered a poten- 
tial deaminase from the intracellular bacterial pathogen 
Orientia tsutsugamushi (OTT_1508, gi: 189184415) at bor- 
derline e-values (e.g. query: Listeria monocytogenes, 
LMHCCJ757; gi: 217965034, recovered the above in it- 
eration 7, e = 0.1) and was confirmed to be a deaminase 
via profile-profile comparisons (HHpred p = 10~ 10 , 90% 
certainty hit to the deaminase domain, PDB: 2 nyt). New 
PSI-BLAST searches initiated with the deaminase domain 
of Orientia OTT_1508, recovered related homologous 
domains from other endoparasites and endosymbionts 
(e.g. Amoebophilus asiaticus Aasi_0969), eukaryotic 
ectopathogens (e.g. Xanthomonas XCV4233), free-living 
bacteria (Nakamurella Namu_1026, gi: 258651268, iteration 
2, e<10~ 3 ), certain apicomplexans (e.g. Toxoplasma 
TGME49_092320) and diverse fungi (e.g. Neurospora 
NCU5062, gi: 85079856, iteration 2, e~ 10 -7 ). Transitive 
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Table 1. Phyletic distribution and synapomorphies of deaminase clades 



Clades 



Phyletic distribution 



Synapomorphies 



Additional comments 



The C-terminal hairpin division 

CDD/CDA cytidine 
deaminases 



Blasticidin S-deaminase(BSD) 
(CDD/CDA derived) 

Plant Des/Cda (CDD/CDA 
derived) 

LmjF36.5940-like a (CDD/ 
CDA derived) 



PITG_06599-like a (CDD/CDA 
derived) 



DYW like' 1 



BURPS668_1122 a (gi: 
126439023) 



Pput_2613 a (gi:148547830) 



SCP1.201 a (gi:21234196) 



YwqJ a (gi: 16080672) 



Bacteria, sporadic in 
archaea eukaryotes 



Firmicutes, actinobacteria, 
fungi 

Plants 



Kinetoplastids 

stramenopiles, chloro- 
phytes, Perkinsus, 
Bdellovibrio 



Haptophytes, 
stramenopiles 



Actinobacteria, 

bacteroidetes, firmicutes, 
gammaproteobacteria, 
ascomycetes, Laccaria, 
rotifer and oomycetes. 
LSE in land plants and 
Naegleria. Independent 
transfer to ascomycetes 



Actinobacteria, 
bacteroidetes, 
cyanobacteria, 
firmicutes, 
P-proteobacteria, 
y-proteobacteria, 
Perkinsus 



Pseudomonas putida, 

Pseudomonas 

entomophila, Taylorella 

equigenitalis, 

Planctomyces maris 
Actinobacteria, 

P-proteobacteria 



Actinobacteria, 
bacteroidetes, 
cyanobacteria, 
firmicutes, fusobacteria, 
planctomycetes, 
proteobacteria, 
basidiomycota 



C[H]AE in Hel-2 (H only in 
minority), PCxxCRmotif in 
Hel-3, E at the end of Str-5 



Same as above 



Same as CDD/CDA for 
N-terminal domain, C-terminal 
deaminase domain inactive 

Same as CDD/CDA 



Same as CDD/CDA, N-terminal 
deaminase domain lacks the 
first C of the CxxC motif, 
C-terminal deaminase domain 
inactive 

K between Hel-1 and Str-1, insert 
between Str-2 and Hel-2 with a 
basic residue, HxEK motif in 
Hel-2, D at the end of Str-4. 
The classical DYW family in 
plants and Laccaria contain an 
additional metal-binding cluster 
composed of two H residues 
and a C-terminal CxC motif. 
The ascomycete versions have 
a large insert between Str-3 
and Hel-3 

RxxDxExK in Hel-2; Insert 
between Str-2 and Hel-2 
CxxCxS motif in Hel-3, many 
members are truncated after 
Hel-3 



Insert between Hel-1 and Str-1 
and Str-2 and Hel-2;HTE 
motif in Hel-2; PCxxCK motif 
in Hel-3 

P at the beginning of Hel-1, 
insert between Str-2 and Hel-2, 
[HD]xEx[KQ] in Hel-2; N at 
the end of Str-3, related to the 
Burkholderia BURPS668J 122 
family 

Gx[CH]xE in Hel-2; Insert 
between Str-2 and Hel-2 
contains a conserved histidine; 
insert between Str-3 and the 
CxxC motif; several members 
are truncated after Hel-3 or 
Str-4 



Involved in pyrimidine salvage 
pathway; a distinct branch of 
this clade in oomycetesis fused 
to SAM and tudor domains. 
Ectocarpus has an inactive 
deaminase fused to 23 tudor 
domains 

Produces a modified base that is 
part of the antibiotic 
blasticidin S 

Predicted editing deaminase 



Kinetoplastids versions are fused 
to CCCH domains, and also 
contain a C2C2 insert between 
Str-1 and Str-2. All other 
members are fused to a 
Rossmann fold domain at the 
N-terminus. Perkinsus homologs 
are fused to a C-terminal 
ubiquitin-binding Zn ribbon 

Contains two deaminase 

domains, both of which appear 
to be inactive 



Eukaryotic versions are editing 
deaminase. Associated domains 
in eukaryotes: PPR, TPR, 
Ankyrins. Secretion pathways: 
T2SS, T6SS, T7SS, PrsW 
related. Repeats: PAAR, RHS. 
Peptidases involved in delivery: 
HINT, PrsW. Immunity 
proteins: Imm5 



Secretory pathways: T2SS, T5SS, 
T7SS (WxG and LDxD), 
terminase based, T6SS, SPVB. 
Repeats:Hemagglutinin, RHS, 
PAAR, Immunoglobulin. 
Peptidases involved in delivery: 
HINT, CPD-like thiol 
peptidase. Immunity proteins: 
Imm2, Imm3, SUKH 

Secretory pathways: T2SS, T6SS 
Repeats: RHS, FN3, 
Immunoglobulin. Some 
associated with an inactive 
transglutaminase 

Secretory pathways: T2SS, T6SS, 
T7SS. Repeats:PAAR, ALF, 
RHS. Peptidases involved in 
delivery: HINT. Immunity 
proteins: Imml, Imm4 

Secretory pathways: T2SS, T5SS, 
T7SS (N-terminal 
WxGorLDxD domains), SPVB. 
Repeats: RHS, ALF, PAAR, 
hemagglutinin. Immunity 
proteins: SUKH3, Imm6. 
Associations in polytoxins:HD 
hydrolase, C2-like peptidase, 
papain-like peptidase 



(continued) 
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Table 1. Continued 



Clades 



Phyletic distribution 



Synapomorphies 



Additional comments 



MafB19 u (gi:254805593) 



Helix-4 division 

TadA-Tad2(ADAT2), Tad3 
(ADAT3) 



Bd3614 a (gi: 42524957) 
Distinct branch of Tad2 
clade 



Tadl, ADAR 



RibD-like (diamino- 
hydroxy-phosphoribosyl 
aminopyrimidinedeaminase) 



Guanine deaminase 



dCMP deaminase and ComE 



AID/APOBEC 



Actinobacteria, 
cyanobacteria, 
firmicutes, 
planctomycetes, 
proteobacteria 



Pan-bacterial, eukaryotic, 
Tad3 pan-eukaryotes 



Bdellovibrio, chlorophytes 



Tadl-Pan-eukaryotic, 
ADAR only in 
metazoans 



Pan-bacterial, sporadic in 
euryarchaea, plants, 
stramenopiles and 
choanoflagellates, 
Perkinsus, 



Pan-bacterial, sporadic in 
euryarchaea, eukaryotes 



Pan-bacterial, sporadic in 
archaea, dsDNA viruses, 
eukaryotes 



Vertebrates 



N at the end of Str-2, HxE in 
Hel-2, V at the end of 
Str-3,+xxCxxC motif in Hel-3, 
G at the beginning of Str-4 



E before Str-1, N in Str-2, 
EPC1MC motif in Hel-2, basic 
residue after Str-4, Two helices 
after Str-5, E and F conserved 
in first C-terminal additional 
helix 



R before Str-1, lacks the terminal 
Str-5, HAExN motif in Hel-2; 
shares M in the CxxC motif 
with Tad2, CxMxC, acidic 
residue at the end of Str-4 

D two residues before HxE 
motif, two adjacent arginines 
in Hel-2 that bind substrate, 
three stranded insert in CxxC 
motif that forms a cap over 
substrate pocket, DK motif in 
Hel-3 of which the K binds 
substrate, R at the end of 
Hel-4 that contact D of DH, 
Additional hairpin after Str-5 
that packs with Str-2 

HxE in Str-2, insert in CxxC 
motif that contains a conserved 
H, extended insert between 
Str-4 and Hel-4 



Obligate dimer, insert-between 
Str-2 and Hel-2, strand 
swapping of Str-5 between 
dimers, large helical insert 
between Str-4 and Str-5 

Bihelical insert between Str-2 and 
Hel-2 that contains a Zn-binding 
motif with two C and a H, C 
between Hel-1 and Str-1 also 
contributes to this motif, NXXP 
at the end of Str-2, NA motif 
two residues after HxE motif, 
TxxxT in Str-3, Y between Str-4 
and Hel-4 

Extended loop between Hel-1 and 
Str-1, charged residue at the end 
of Str-1, Win Str-3, SxS just 
before the PCxxC motif in Hel-3, 
APOBEC-4 have a CxxxxxC sig- 
nature in Hel-3, basic residue in 
extended loop between Str-4 and 
Hel-4, M at the end of Str-5, two 
additional helices after Str-5, F in 
first additional Helix shared with 
the Tad2-TadA family, highly 
conserved W between the 
terminal helices, several basic 
residues in second terminal helix 



Secretory pathways: T2SS, T5SS, 
T6SS, MafBN-dependent 
secretion. Repeats: RHS, 
Hemagglutinin peptidases 
involved in delivery: HINT. 
Immunity proteins: SUFU, 
SUKH 

tRNA editing deaminase; in eu- 
karyotes Tad2 and Tad3 form 
a heterodimer; Tad3 lacks the 
E in the HxE motif; in several 
basidiomycetes, Tad3 is fused 
to a SET domain that might 
be involved in synthesis of a 
modified tRNA base or methy- 
lation of associated protein 

In the neighborhood of a gene 
encoding the 23S rRNA 
G2445-modifying methylase. 
Fused to a distinct N-terminal 
globular domain 

Tadl involved in tRNA Ala 
editing. Some ADARs are 
inactive, e.g. ADAD2 



Riboflavin biosynthesis pathway. 
Some versions in plants are 
inactive; usually fused to a 
C-terminal DHFR reductase 
domain. In saccharomycete 
yeasts, the protein is further 
fused to S4 and pseudouridine 
synthase domains at the 
N-terminus 

Catabolism of guanine 



Uracil biosynthetic pathway; 
Note: Methanopyrus RNA 
editing enzyme CDAT8 is a di- 
vergent member of this group 



Mutagenic diversification of 
immunity molecules, mRNA 
editing, mutagenic anti-viral 
activity; lamprey PmCDA2 
fused to a C-terminal AT-hook 
domain; 



(continued) 
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Table 1. Continued 



Clades 



Phyletic distribution 



Synapomorphies 



Additional comments 



Novel AID/APOBEC-like 

Caenorhabditis elegans 
ZK287.1" (gi:17566846) 



Nematodes, Nematostella, 
Micromonas, Emiliania 



Novel AID/APOBEC-like 
bacterial homologs 
Wolbachia endosymbiont 
B3gp45" (gi:222825157) 



XOO_2897 a (gi:84624554) 



OTT_1508 a (gi: 189 1844 15) 



Wolbachia endosymbiont 
of Cadre cautella, 
Pseudomonas 
brassicacearum 



Actinobacteria, firmicutes, 
P-, y-, 5-proteobacteria 



Actinobacteria, chloroflexi, 
cyanobacteria, 
fibrobacteres/ 
acidobacteria, firmicutes, 
a and 

gammaproteobacteria, 
Fungi, Leishmania, 
Selaginella moellendorffii 
Trichoplax adhaerens, 
Toxoplasma gondii, 
Neospora 



HxEE motif in Hel-2, insert in 
the CxxC motif of Hel-4, E in 
Str-5, residues or elements 
shared with AID/APOBEC: 
extended loop between Str-4 
and Hel-4; large hydrophobic 
residue (L/M) at the end of 
Str-5, two helices after Str-5, 
Da (a: aromatic) in the first 
additional C-terminal helix, W 
in second additional C-terminal 
helix 

R before Str-1, D at the end of 
Hel-2, KxxE motif in Hel-6. 
Residues/elements shared with 
classical AID/APOBEC; 
deaminases: E in Hel-3, large 
hydrophobic residue (W) in 
Str-3, extended loop between 
Str-4 and Hel-4, V/M in Str-5, 
two additional helices after 
Str-5, D in first additional 
helix 

E in insert between Str-3 and 
Hel-3, aromatic residue 
between Str-4 and Hel-4 shared 
with AID/APOBEC 
deaminases, truncation after 
Hel-4, Str-5 absent, a subset 
have an insert between Str-2 
and Hel-2, this same subset 
has a C just before Str-1 

GxxK motif before the CxxC 
motif; Extended loop between 
Str-4 and Hel-4 with a 
conserved polar (usually H) 
and axxP (a: aromatic); fungal 
proteins have a helical insert 
between Str-2 and Hel-2 



Fast evolving homologs of the 
above deaminases. The 
Nematostella, Micromonas and 
Emiliania proteins contain a 
Zn-chelating domain inserted 
into the N-terminus of the 
deaminase domain, the 
nematode versions are fused at 
their N-terminus to eight 
repeats of a CxC-like domain 



Secretory pathways; SPVB. 
Repeats: RHS 



Secretory pathways: T2SS, T6SS, 
T7SS. Repeats: RHS, PAAR. 
Immunity proteins: SUKH4 



Secretory pathways: T7SS, PVC, 
T6SS. Peptidases involved in 
delivery: PVC metallopeptidase 
Immunity: SUFU (fused). 
Polytoxins: HTH, DOC, 
ColE3, Kinase. Fungal version 
fused to an N-terminal a + P 
globular domain, 
Apicomplexan versions fused 
to tRNA guanine 
transglycosylase domain; 
intracellular parasites may have 
more than one copy; some 
fungi have lineage-specific 
expansions of this family 



"Indicates novel clades reported in this study. 



searches initiated with the Streptomyces coelicolor 
SC4A7.11-like deaminase domain recovered a completely 
different set of deaminase domains from actinobacteria 
and proteobacteria. Profile searches with the Wolbachia, 
B3gp45 however, were unique in that they only recovered 
the vertebrate AID/APOBEC deaminases as best hits in 
PSI-BLAST (e~Q.01) and JACKHMMER searches 
(e = 6.4 x 10~ 7 ). As in the above examples, we performed 
several exhaustive and recursive searches until no new 
deaminase domains were recovered. All retrieved proteins 
were clustered using the BLASTCLUST program, and 
clusters belonging to previously characterized groups 
were identified. Clusters that were not unified to any of 
the known groups were marked as potential founders of 
new groups. A progressive multiple alignment was 



constructed by first aligning individual clusters using the 
KALIGN and PCMA programs and then combining them 
into a super-alignment. By this, we also obtained sequence 
and structural features that are shared by each of the newly 
identified groups and used them to unify any of the new 
clusters with known clades (Table 1). 

This systematic search uncovered thirteen novel clades 
of deaminases (Table 1). In this study, we uncovered pre- 
viously unknown bacterial, oomycete, rotifer and fungal 
representatives of the DYW clade. Of the novel clades, 
three clades typified by Streptomyces SCP1.201, 
Neisseria MafB19 and Xanthomonas XOO_2897 are 
found only in bacteria (Table 1). They are found sporad- 
ically across a wide range of bacteria, suggestive of disper- 
sal by lateral transfer. Five clades, prototyped by 
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Burkholderia BURPS668J 122, Streptomyces SCE41.26, 
Orientia OTT_1508, Bacillus YwqJ and DYW are 
similar in phyletic profile to the above clades, but, in 
addition to bacteria, are also present in one or a few eu- 
karyotic lineages. Further, the DYW clade and the 
Orientia OTT_1058-like clade respectively show massive 
lineage-specific expansions in land plants and basidio- 
mycete fungi (Supplementary Data). The AID/ 
APOBEC-like clade was previously found only in verte- 
brates. However, searches initiated with APOBEC4 
deaminases retrieved matches to putative deaminase domains 
outside of vertebrates in nematodes (e.g. Caenorhabditis 
elegans ZK287.1), the cnidarian Nematostella, the chloro- 
phyte alga Micromonas and the haptophyte alga Emiliania 
that displayed a conservation pattern similar to the AID/ 
APOBEC clade (Tablel and Figure 3). Profile-profile 
searches with these proteins recovered members of the 
AID/APOBEC clade (e.g. PDB: 2nyt; P = 10" 6 ; 95% cer- 
tainty) confirming this relationship. These searches also re- 
covered a related deaminase domain from the plant 
pathogenic bacterium Pseudomonas brassicacearum 
(PSEBR_ml207; gi: 330810772). Additionally, as noted 
above, the Wolbachia B3gp45 also showed a specific rela- 
tionship to the AID/APOBEC clade (Figure 3). These newly 
detected versions share with the classical AID/APOBEC 
deaminases an extended loop between strand-4 and helix-4, 
a large hydrophobic residue (mostly methionine) at the end 
of strand-5 and a characteristic Da (where a: aromatic, 
mostly W) motif at the beginning of helix-5. Wolbachia 
B3gp45 also shares a conserved tryptophan residue before 
the CxxC motif with the vertebrate AID/APOBEC-like 
proteins (Figure 3 and Supplementary Data). Thus, for the 
first time, we were able to define an extended AID/ 
APOBEC-like clade with members outside of vertebrates. 

All novel deaminase clades fall in either of the two major 
divisions of the deaminase super family. A systematic 
sequence-structure analysis of the novel clades showed 
that all of them can be grouped into either the 
C-terminal hairpin or the Helix-4 divisions (Figure 2). 
The clades typified by Burkholderia BURPS668_1 122, 
Streptomyces SCP1.201, DYW, Bacillus YwqJ, 
Pseudomonas Pput_2613, Neisseria MafB19 and some 
novel divergent branches of the CDD/CDA-like clade 
are unified to the C-terminal hairpin clade (Figure 2 and 
Table 1) In contrast, the AID/APOBEC-like clade and 
those typified by Xanthomonas XOO_2897 and Orientia 
OTT_1508 belong to the Helix-4 division (Figure 2 and 
Table 1). However, many of these newly detected clades 
show some unexpected deviations from the previously 
characterized template of the deaminase fold: (i) unlike 
most previously characterized clades of the C-terminal 
hairpin division, which display a CxE motif in helix-2, 
novel members of this division show notable variations. 
For instance, the BURPS668 1 122 clade possesses a DxE 
motif, whereas, like the DYW, the clade typified by 
Neisseria MafB19 contains a HxE motif (Figure 3). The 
clades prototyped by Bacillus YwqJ and Streptomyces 
SCP 1.201 each show internal variability in the same 
position with both HxE and CXE motifs in the former 
and a DxE or HXE motifs in the latter (Figure 3); 



(ii) another remarkable aspect seen only in a subset of 
the deaminases is the truncation of C-terminal structural 
elements. In the clades typified by Burkholderia 
BURPS668J122, Bacillus YwqJ, Orientia OTTJ508 
and Xanthomonas XOO_2897 C-terminal elements after 
strand-3 show different degrees of degradation (Figure 3). 

The novel clades are also characterized by specific con- 
served signatures and inserts, which, as in the above- 
discussed examples, are associated with the substrate- 
binding pocket or form predicted caps above the pocket 
(Table 1). These features allowed us to discern the higher 
order relationships of the newly identified clades with 
respect to the previously characterized clades of the 
deaminase superfamily. The clades typified by Bacillus 
YwqJ, Burkholderia BURPS668_1 122 and Streptomyces 
SCP1.201 appear to form a higher order group within 
the C-terminal hairpin division unified by an insert 
between strand-2 and helix-2 (Figure 2). The latter two 
clades are further unified by features such as a conserved 
polar residue (either lysine or glutamine) two residues 
downstream to the catalytic glutamate in helix-2. The 
clades typified by Xanthomonas XOO_2897 and Orientia 
OTTJ508 uniquely share with the AID/APOBEC-like 
clade the extended insert between strand-4 and helix-4, 
which is important for mutagenic motif choice and in 
the selection of cytosine for deamination. This suggests 
that these clades might be united into a higher order 
grouping, and might all deaminate cytosine. Yet, the 
marked sequence variability in this loop within and 
between the clades in this group suggests they are 
probably under selection for targeting distinct muta- 
genic motifs. In this context, the Nematostella and algal 
AID/APOBEC-like deaminases also display an insert of 
a Zn-binding domain between helix- 1 and strand- 1 
(Figure 4 and Supplementary Data). Given the predicted 
role of this region in determining the specificity at the — 1 
position of the mutagenic motif, it is possible that this 
Zn-binding domain has a role in determining target spe- 
cificity. In contrast, the nematode versions are unique in 
containing a distinct insert of a Zn-chelating domain 
between the two metal-coordinating cysteines of the 
deaminase active site comparable with the similarly pos- 
itioned insert in the Tadl family (Figure 4 and 
Supplementary Data). Given its location, it is also likely 
to be critical for target sequence recognition. While most 
of the above clades are rapidly evolving and prone to 
C-terminal degeneration, they might further unify with 
Tad2/TadA clade within the Helix-4 division (17). In 
support of this link, we had noted that the AID/ 
APOBEC clade shares additional helices after strand-5 
with Tad2/TadA (Figure 2). Another key insight provided 
by this classification is that, many of the other newly 
defined clades combine members from both bacteria and 
eukaryotes. In addition to the AID/APOBEC-like, 
OTT_1508-like and DYW clades in which we found 
both bacterial and eukaryotic versions, we also identified 
novel eukaryotic deaminases in several other clades. Chief 
among these are the deaminase domains from the alveo- 
late Perkinsus (e.g. gi: 294948387) belonging to the 
BURPS668_1122 clade, from basidiomycete fungi (e.g. 
gi: 170114820 from Laccaria bicolor) belonging to the 
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Figure 4. Representative domain architectures of the deaminase superfamily. Proteins are denoted by their name, species and gi. Architectures are 
grouped based on the deaminase lineage in which they are present. Domains newly identified in this study are indicated by blue margin. For most 
part, standard domain names were used (as in PFAM). The various families of the SUKH superfamily of anti-toxins (e.g. Smil, SUKH3 or SUKH4) 
are individually labeled. Other domain abbreviations: BactIG — a family of immunoglobulin fold domains found in bacteria; Bd3614N- N-terminal 
domain found in Bd3614-like deaminases; CPD — a Clostridium difficile Toxin A CPD type thiol peptidase; NT-ot — N-terminal ot-helical domain 
limited to firmicutes; PG_binding: peptidoglycan binding; various PT domains are pre-toxin domains; PseudoN — N-terminal domain limited to 
Pseudomonas; TM — transmembrane; Toxin_PL — Predicted papain-like peptidase toxin; SP — signal peptide; Tail_Fiber, a phage tail fiber-like pep- 
tidase; Tu — tudor; X — uncharacterized globular domains; Y — novel Rossmann fold domain. MafBN is a A^/Mena-specific domain involved in toxin 
delivery along with the MafA lipoprotein. 
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YwqJ clade, and from diverse unicellular eukaryotes (e.g. 
gi: 157877766 from Leishmania major) belonging to a 
novel branch of the CDD/CDA-like clade. 

Functional inference for the newly identified versions of 
the deaminase superfamily 

The new clades of prokaryotic deaminases define toxin 
domains of novel polymorphic and host-targeted toxin 
systems. Contextual information gleaned from predicted 
operons or conserved gene neighborhoods and domain 
architectures are an effective means of functional inference 
for poorly characterized proteins and domains (63,64). 
Such contextual information can be represented as net- 
works that also help in defining key functional themes 
pertaining to particular types of protein domains (65,66). 
Our analysis revealed that most of the novel prokaryotic 
deaminase clades uncovered in this study are toxin 
domains, thereby confirming and extending our previous 
investigations on the widespread prokaryotic polymorphic 
toxin systems. Our previous analysis had uncovered 
specific syntactical features of the domain architectures 
and genomic organization of polymorphic toxin systems 
(24): (i) complete toxin proteins in these systems show a 
tripartite organization with N-terminal modules involved 
in secretion of the toxin protein via either one of the 
several prokaryotic secretory systems. This is followed 
by central 'linker' modules that are involved in formation 
of extended filamentous structures at the cell surface, such 
as the RHS repeats or other low complexity or a-helical 
repeats. These are followed by the C-terminal module, 
which bears the elements required for delivery of the 
toxin to the target cell and also the toxin domain itself 
at the extreme C-terminus; (ii) the genome organization 
of the toxin encoding gene is characterized by the presence 
of several, often unrelated standalone toxin cassettes 
which do not encode N-terminal trafficking modules. 
These might recombine with the 3'-end of the primary 
toxin gene to displace the pre-existent toxin module, and 
generate a diversity of toxins with the same N-terminal 
trafficking and delivery elements but different C-terminal 
toxin domains that usually operate on nucleic acids. 
Hence, these systems are termed polymorphic toxin 
systems; (iii) homologous toxin domains tend to diverge 
considerably from each other, often even between differ- 
ent strains of the same species; (iv) polymorphic toxin gene 
neighborhoods are often typified by the presence of one or 
more tightly linked genes encoding immunity proteins that 
confer resistance to the host cell against both its own 
toxin, as well as invading ones. We previously identified 
two widespread types of immunity proteins, belonging to 
the SUKH and SUFU superfamilies, which appear to 
mediate immunity by means of distinctive structural scaf- 
folds capable of recognizing a diverse set of protein 
ligands (24). 

Indeed, all these features were clearly observed in several 
of the newly detected prokaryotic deaminase clades 
(Figures 4 and 5). They either occurred as the 
C-terminal most domain of a large polypeptide with dis- 
tinct N-terminal trafficking-related modules (see below) or 
as a standalone toxin cassette encoded in a gene 



neighborhood bearing a complete toxin gene. The gene 
neighborhoods of the deaminase toxins also frequently 
contain additional standalone cassettes that could provide 
alternative toxin domains for the polymorphic toxin. 
These include several distinct nucleases (e.g. distinct rep- 
resentatives of the HNH/EndoVII fold namely NucA, 
WHH, DHNNK families and representative of the restric- 
tion endonuclease fold), peptidases (e.g. a novel version of 
the papain-like fold) and nucleic acid-binding domains 
(e.g. an AraC-like HTH that is predicted to function as 
a toxin) in addition to deaminases from distinct clades 
(Figure 5; Zhang, D., Iyer,L.M. and Aravind,L., manu- 
script in preparation) (24). At least four distinct deaminase 
clades are associated with genes encoding an immun- 
ity protein of the SUKH superfamily (Table 1 and 
Figure 5). Immunity proteins of SUFU superfamily are 
often associated with genes coding for deaminases belong- 
ing to the clade prototyped with the Neisseria Mafl9 toxin 
and some representatives of the Orientia OTT_1508-like 
clade (e.g. Salinispora Sare_4829, Figure 5). The conserved 
syntax in the genomic organization of these toxin systems 
(Figure 5) also allowed us to predict six previously un- 
known immunity protein families (labeled Tmm' 
followed by a number; Supplementary Data). The most 
widespread of these is the Imml family (e.g. SCP1.202) 
that is found encoded in the neighborhood of some 
deaminases of the SCP 1.201 clade. We also detected 
Imml as occurring in other polymorphic toxin systems 
in actinobacteria, firmicutes, cyanobacteria, bacteroidetes 
and proteobacteria independently of the deaminase with 
alternative toxin domains. Secondary structure predic- 
tions reveal an a + (3-fold with a conserved tryptophan at 
the C-terminal end of this domain (Supplementary Data). 
The secondary structure, with a prominent central sheet, is 
reminiscent of the SUKH and SUFU superfamilies, 
although we could not unify it with either of them. 
Likewise, the predominantly a-helical Imm5, and the 
a + (3 Imm6, which are associated with toxin deaminases 
of the DYW clade and YwqJ, respectively 
(Supplementary Data), are also seen in the context of 
other toxin domains across several phylogenetically 
distant bacteria. These observations suggest that, like 
immunity proteins of the SUFU and SUKH superfamilies, 
Imml, Imm5 and Imm6 might provide structural scaffolds 
that could potentially interact with multiple structurally 
distinct toxin domains. The remaining predicted 
immunity protein families are more limited in their distri- 
bution and are primarily associated with the deaminase 
domains of the BURPS668 1 122 (Imm2, Imm3) and 
SCP1.201 (Imm4) clades (Figure 5). 

We also recovered two novel organizational themes 
among deaminase toxins that departed from the classical 
organization of the polymorphic toxin systems. The first 
of these themes was characterized by toxins in which the 
C-terminal toxin module contains not one, but multiple 
unrelated toxin domains, each with very distinct catalytic 
activities (Figure 4). We accordingly term these toxins as 
polytoxins. For example, in Salinispora arenicola 
Sare_4829 the C-terminal toxin module includes in add- 
ition to the deaminase domain a second toxin domain, 
namely of the DOC superfamily, which AMPylates 



9488 Nucleic Acids Research, 2011, Vol. 39, No. 22 



B URPS668 1122 Burk holderiapseudc^^ (T5SS) 

Peptidase.siP; f IT ^pSSff^fr^ " I P0TRA4C< * iB ) > BURPS668_1122 clade 

NMA0688_Neiss eria meningitidis_21 87 67634 (betaproteobacteria) {T5SS) 

POTRA+CdiB ^^jj|^^_lmm4_ : ""^S" 37 |_ ■ ""^venn 6 " ' | PT-VEN N 



Pad 4514 Del fti a acidovorans_1 6089 9 946 (betaproteo b acteria) 

FNS^C^R ■RH^'-HNTO^I'In!'^)- SMI1 HINT 




254240896 (gammaproteobacteria) (T5SS) 

, „ SP+RHSS+ 

ImmZ Tox |„ Unkrovi 



POTRA+CdiB 

bmyco0003_49990_Bacillus mycoides_2290lX)457 (firm£utes) 

HNH SMI1 P + TL^ U ^ Imm3+lmm3 



RHSs \ ' RHSs 
■ NUCN+AraC 



BA 2198 Bacillus anlhracis _30262213 (fjmTjcutes^^SS) 

WXG j | ■ ™aXa^ % lmm3 ' DDC I D HNNK " 



BC059799_3295_Bacillus cereus 19603788Hfirmicutes) (T7SS) 

■ALH+TransGLUT j | Prr ^ea™W " lmm3 NUC_N + NucA 

PPE_00834_Paenibacillus polym yxa_308067617 (firmicutes) {T6SS) 
Imm3+lmm3 



Sare_1280_Salinispora arenicola_157915758 (actinobacteria) (T2SS) 
SMI1 



Cpin_2776_Chitinophaga pinensis_256421803 (chlorobi) (T2SS) 

_SP+FN3_ i | SP+FNa+DeammM* SMI1 



TEQU^434 Taylorella eq uigenitalis_3 19778608 (betaproteobac teria) (76S S) 
| i i III I^J^^^ | | RHSs" | SMI1 ; j RHSs+WHH 

Pput_2613_Pseudomonas putida_1 48547830 (gammaproteobacteria) [T6SS) 
SP+PAAR+PAAR+\ 
SpvB+RHSs+AHH 



Pput_2613-like clade 



YwoJBacillusNc heniforrriis 520 81063 (firmicutes) (T7SS) 

'Deaminase* ' Imm6 
S GR 4387 Strep t omyces griseus 18243818Q(ac tinobacteria) 
< SU KH4 WHH+SMI1 Deammase* * SUKH3 

St rop 0483 Salinisp ora tropica 145 5 93049 (actinob acteria) ( T7SS) 

SUKH3 | | WXG~ ~ ) j WXG~^ Xf~[>P " 
P^^^^^^T^^^^a^^jum carotovorum_253686459 (gammaproteobacteria) 

PC1 2330 Pec t obacterium caro t ovorum 253688708 (g ammaproteobacteria) ( T5SS) 
CdiB HlyC 



YwqJ-like clade 



SM11 > 



PROVALCAL_0 3 295_Providencia alcal ifaciens_21271221Q (gammaproteobacteria) (75SS) 

CdiB | S ^-^NNTD F 4 H miS H MafB19-like clade 

HMPREF0013_00154_Acinetobactersp._29360802^gammap^ (T5SS) 
MFS ] ■ j CdiB SP £^n*^*Sl> 

HMPREF9348 Q1319 Escherichia coli 309794139 (gammaproteobacteria) (T6SS) 



PA39016_00438 0000_Pseudomonas a eruginosa_3131 12489 (gammaproteobacteria) 

DU : ■ V[ I-.'- h , 

mafB19_Neisseria meningitidis_254805593 (betaproteobacteria) 

I SP+HINT I SUFU Deaminase" NMC0308 HINT )T~ DUF2185 



HMPREF9057 0 0892 Actinomy ces sp._320532150 (actinobacteria) {Membrane PrsW peptidase pathway) 

P ' S nL^^l^ ' Imm5_frag -™, ... 

D e3 m>n 3 si,^n DYW-like clade 

AoriK_010100002260_Actinomyces oris_325066609 (actinobacteria) (T7SS) 
WXG* Deaminase' ■ Imm5 ■ I mm5 ^ > 

PsgB076_12109_Pseudomonas syr1ngae 320324459 (gammaproteobacteriaHTCSSJ 

wxg dui ■ m " 1 



Xoryp_010100Q12405_Xanthomonas oryzae_166712224 (gammaproteobacteria) (T6SS) 



OTT 1508-like clade 



SCP1.201_Streptomyces coelicolor_21234196 (actinobacteria) (T2SS) 
Imm1 



S are_4829_Salin ispor a arenicola_1 59 0 4031 7 (actinobac t eria) (T7SS) 
~WXG~ j f ( WXG ~ [(' WXG | ) | WXG 

Ktedonobac^acemifer^ ^ 

^^PODC ■ | 1 1 1 1 | | > Metallopeptidase 




mirum_256379306 (actinobacteria) 



SCP1.201-like clade 



SSCG 06056 S treptomyces clavuligerus 254393541 (actinobacteria) (T7SS) 
Imm1 > 



sce4086_Sorangium cellulosum_162452362 (deltaprotGobacteria) ( TfiSS) 
| SP+PG_binding^ HNH ^ Sce4085 ' 

S CAB...54231. St r eptomyces scabiei_290959816 ( actinobacteria) 
SUKH4 |< Deaminase WHH+SMI1 



rhsA2_Burkholderia pseudomallei _76817789 (betaproteobacteria) (T6SS) 

TPR_repeats ^^J^^^^SP-TpsA-SO+FilHs ■ CdiB 



Bd3614_ Bdellovibrio bacteriovorus _42524957 (deltaproteobacteria) 
rRNA_methylase 



LmjF36.5940-like clade 



XOO 2897 Xanthomonas oryzae 84624554 (gammaproteobacteria) (T6SS) 


^ni ,:j M > XOO. 2898 




VgrG J 


SSEG 10154 Streptomyces sviceus 297202766 (actinobacteria) (T7SS) 


WXG )) WXG ~)| )■] WXG SUKH4+Deaminase1 


SUKH4 


1 



SP+ZTM+Toxin_PL 



i WXG+TM-t-PTTG 



XOO_2897-like clade 
A 



Terminase pathway PVC 

Terminase .SS TerminaseJ.S 



VI DUF1309 CIpV 



T7SS 



T2SS 




T5SS -hi,c p ; p ; f w e ppR 

Membrane peptidase 
pathway 



1: BURPS668_1122-like 
2:SCP1.201-like 
3: YwqJ-like 
4: MafB19-like 
5: DYW-like 
6: X00_2897-like 
7: OTT_1508-like 
8:AID/APOBEC-like 
9: CDD-like 
10: Tad1/ADAR-like 
11: Pput_2613-like 
12: Tad2-like 



Figure 5. Gene neighborhoods and contextual connection network of the deaminase superfamily. (A) Individual genes are represented as arrows 
pointing from the 5'- to the 3'-end of the coding frame. Genes were named according to their domain architectures. For each operon, the gene name, 
species name and gi of the deaminase (marked with a star) are indicated. Uncharacterized genes are shown as small gray boxes. Where possible, 
secretion pathways are indicated. Smil, SUKH3 and SUKH4 are different clades of immunity proteins belonging to the SUKH superfamily. 
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threonines or serines in target proteins (66,67). Other 
polytoxin proteins combine the deaminase in the same 
polypeptide with other toxin domains, such as a HD 
hydrolase (predicted to function as a cyclic nucleotide 
hydrolase), a ColE3-like nuclease and peptidases (Figure 
4; a novel papain-like, a Clostridium difficile Toxin A 
CPD-like and a C2-like peptidase; Zhang, D., Iyer,L.M. 
and Aravind,L., manuscript in preparation). These obser- 
vations suggest that polytoxins could be a variant of the 
classic polymorphic toxin systems, wherein multiple toxin 
domains are deployed simultaneously rather than through 
episodic displacement of the existing toxin domain by re- 
combination with a distinct cassette. The second novel 
theme among deaminase toxins (typically belonging to 
the AID/APOBEC-like, OTT_1508-like, and XOO_2897- 
like clades) is the presence of versions occurring independ- 
ently of a gene encoding an immunity protein. One or 
more of these deaminase toxins are commonly found in 
the genomes or integrated prophages of several phylogen- 
etically related as well as distant endosymbiotic or endo- 
parasitic bacteria, such as Orientia, Rickettsia, Wolbachia 
and Amoebophilus that infect a variety of eukaryotes. 
These are further distinguished from classical polymorphic 
toxin systems with immunity proteins, because the latter 
are usually not found in endosymbiotic or endoparasitic 
bacteria. Similar, immunity protein-independent 
deaminase toxins are also found in association with 
certain secretion systems that deliver cargo into other 
cells in extracellular pathogens of eukaryotes such as 
Burkholderia and Xanthomonas, as also free-living 
bacteria-like Sorangium cellulosum and Streptomyces 
(Table 1 and Figure 5; see below). The above examples 
closely parallel the deployment in host cells, by both extra- 
cellular and intracellular parasitic bacteria, of other toxin 
domains, such as the EndoU fold RNAse domain and the 
DOC AMPylating domain that are also shared with the 
classical polymorphic toxin systems (24,67). 

Experimental evidence from the proteobacterial poly- 
morphic toxins (i.e. the proteobacterial contact-dependent 
inhibitory systems) has shown that they are primarily 
deployed against closely related bacterial strains (25,26). 
This principle of action can be generally extended across 
all polymorphic toxins with linked immunity proteins by 
virtue of the fact that they possess a mechanism to defend 
against the action of their own toxins. This contention is 
also supported by the near complete absence of such 
immunity protein-containing systems in endosymbiotic 
or endoparasitic bacteria, because these are less likely to 
encounter competing cells from related strains in close 
proximity. Thus, deaminase toxin domains from such 
systems, like other toxins domains of polymorphic toxin 
systems, are predicted to primarily operate in resource 
competition between related bacterial strains. As a 



corollary, they could also operate in discrimination 
between kin and non-kin cells during cellular-aggregation 
phenomena such as biofilm or multicellular colony forma- 
tion. In contrast to these, the systems lacking immunity 
proteins are likely to be deployed as toxins against their 
eukaryotic hosts or against distantly related environmen- 
tal competitors. In terms of the potential targets of these 
deaminase toxins, it is likely that they edit/mutate RNA to 
disrupt protein synthesis in the target cells. Indeed, disrup- 
tion of protein synthesis through modification or cleavage 
of RNA is a widely used strategy by several unrelated 
toxins of different systems such as the polymorphic 
toxins (25), the plasmid-borne colicins (68), conventional 
toxin-anti-toxin systems (66) and virulence/defensive 
toxins of bacteria, plants, fungi and animals (69-72). 
This is also consistent with the above noted higher-order 
relationships between these bacterial clades of deaminases 
and known RNA-modifying deaminases such as the Tad2/ 
TadA tRNA deaminases and the DYW deaminases 
(Figure 2). Hence, especially, in the case of the toxin 
clades related to the former deaminases, tRNA could be 
one target. The other possibility is that some of these 
deaminases are analogs of the AID or APOBEC enzyme 
and hypermutate DNA or mRNA resulting in cell death 
by disruption of the genome or synthesis of key proteins. 
The deaminases secreted into the host cell by endosymbi- 
otic (e.g. Amoebophilus) or endoparasitic bacteria (e.g. 
Wolbachia, Orientia and Rickettsia) or injected into target 
cells by ectopathogenic bacteria (e.g. Xanthomonas) could 
possibly modify host physiology by RNA editing or 
altering gene expression by genome mutation. Evidence 
favoring a toxin function for the newly identified prokary- 
otic deaminase clades also provides an explanation for 
their extreme sequence divergence and structural malle- 
ability indicated by the independent loss of C-terminal 
structures: they are likely to face diversifying pressure 
from evolution of resistance against them due to 
sequence divergence of their targets and acquisition/emer- 
gence of new immunity proteins. In this sense, the diver- 
gence of these deaminases closely parallels that of other 
toxins that operate on nucleic acids (e.g. nucleases of the 
restriction endonuclease fold) relative to their homologs 
involved in core cellular functions (73). 

The bacterial toxin deaminases are associated with diverse 
trafficking and release systems. Our analysis indicated that 
the trafficking and delivery systems might notably influ- 
ence the functional contexts in which a particular toxin 
deaminase might be deployed. Thus, the same clade of 
toxin deaminase might be trafficked via any one of eight 
distinct secretory systems in different organisms with 
varying functional outcomes (Table 1; Figures 4 and 5). 
By analyzing the N-terminal domains and gene neighbor- 
hoods of the deaminase toxins, we were able to identify 



Figure 5. Continued 

(B) Domains linked in a polypeptide are indicated by solid lines, whereas, contextual linkages between genes in operons are indicated by dashes of 
different colors. Lines are colored based on the deaminase clade. Black arrows indicate the polarity of domain arrangement in a polypeptide with the 
arrowhead pointing to the C-terminus, and white arrows show the order of genes in operons from 5' to 3'. Multiple copies of domains or their direct 
linkages in operon are shown with arrow cycles. Key protein domains that correspond to diverse secretion systems (T5SS, T2SS, T7SS, T6SS, PVC, 
PrsW and the terminase system) are grouped together. Different deaminase clades are labeled with deaminase followed by numbers from 1 to 12. 
Toxin domains that are present in polytoxins are linked with bold lines. For domain abbreviations, please refer to Figure 4 legend. 
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several distinct secretory mechanisms for them. Three of 
these are widely used by polymorphic toxins systems with 
immunity proteins: (i) in proteobacteria the predominant 
secretory system that is used for deaminase toxins is the 
Type V secretory system (T5SS, also called two-partner 
secretory system). This is shared with several other toxin 
domains and is a hallmark of the proteobacterial 
contact-dependent inhibition systems. In this system, a 
'TpsA-like secretory domain' (TpsA-SD) composed of 
filamentous hemagglutinin repeats, present at the 
N -terminus of the toxin protein, binds its partner, the 
outer-membrane TpsB (FhaB/CdiB) protein, resulting in 
the export of the toxin effector (26). These proteins are 
characterized by a variable number of filamentous hem- 
agglutinin repeats and pre-toxins domains, such as the 
PT-637 (Pfam database: DUF637) and PT-VENN 
(which might recognize receptors on the target cell), 
N-terminal to the deaminase domain (Figures 4 and 5; 
Table 1). This indicates that these toxins occur at the 
tips of long filamentous structures projecting at the cell 
surface and are primarily delivered through contact. 

(ii) In firmicutes and actinobacteria, those deaminases 
toxins which are trafficked by the T5SS in proteobacteria, 
instead usually utilize the ESX/ESAT6 export pathway 
(also called Type VII secretory system, T7SS; Figures 4 
and 5 and Table 1) (74,75). Here, the toxin protein is 
typified by an N-terminal domain of the WXG superfam- 
ily, which is recognized by a multi-protein membrane- 
associated complex, and transported by the action of an 
ATPase pump of the YueA-like clade of the FtsK-HerA 
superfamily (76). A variant T7SS is seen in the firmicutes, 
in which the toxin effectors contain an N-terminal variant 
WxG domain (LDxD domain) that is always followed by 
a transmembrane helix, suggesting that the toxin is 
anchored to the cell membrane. Organizationally, these 
toxins might be either filamentous structures with 
deaminase domains at the tip (resembling the above 
versions from proteobacteria) or include smaller toxins 
with reduced central regions that might be secreted out. 

(iii) A potentially novel secretory mechanism that we un- 
covered in this study is prototyped by a novel polymorph- 
ic toxin system from Actinomyces with a deaminase 
domain of the DYW clade (gi: 320532150). In these 
proteins, the toxin domain is fused to a N-terminal 
intramembrane peptidase domain of the PrsW family, 
which comprises of a 12-TM helices (Table 1, Figure 4) 
(77). This architecture suggests that the toxin is exported 
through the 12-TM PrsW-like channel and cleaved during 
this process by the intramembrane protease activity for 
release. 

Gene-neighborhood analysis indicates five other delivery 
systems that are widely associated with deaminase toxins, 
but unlike the above, in these cases, the toxin genes might 
sometimes not or never contain adjacent genes encoding 
immunity proteins (Table 1 and Figure 5). (i) The conven- 
tional Sec-dependent system or the Type II secretory 
system (T2SS), relying on an N-terminal signal peptide, 
which is the most common export pathway for secreted 
proteins (77), is used to deliver toxins deaminases across 
several bacterial lineages. At least five distinct clades of 
toxin deaminases, often with filamentous N-terminal 



regions, from both major divisions of the superfamily 
utilize this pathway (Figures 4 and 5; Table 1). In 
addition to the polymorphic toxin systems with immunity 
proteins, which are deployed against related bacterial 
strains, the T2SS is also used by systems lacking an 
immunity protein from endosymbiotic or endoparasitic 
bacteria and certain other forms like Solibacillus (e.g. 
SSIL 0818) to deliver deaminase toxins to their hosts or 
target cells, (ii) The type VI secretion system (T6SS), 
which is mainly found in proteobacteria, is an evolution- 
ary exaptation of the DNA delivery system of the caudate 
phages for extruding proteins out of the producing cell 
(78). Its core comprises of the VgrG protein, a fusion of 
the T4 gp5 and gpl7-like proteins that forms a channel 
through the periplasm and the outer membrane of the 
proteobacterial cell. This system might include other 
homologs of phage tail/base-plate proteins and use 
ClpV, a ClpB-like AAA + ATPase, to provide energy for 
export (79). Another key component of this system that 
is often next to the VgrG gene is a gene encoding a pro- 
tein containing a MOGl/PspB-like (DUF1795) domain 
(Table 1; Figures 4 and 5). Based on its contextual asso- 
ciations, we predict that this domain is a key structural 
component of the T6SS that might associate with the toxin 
protein during its delivery (Zhang, D., Iyer,L.M. and 
Aravind,L., manuscript in preparation). Certain toxins 
exported by this pathway might also contain N-terminal 
RHS repeats suggesting that, like the above toxins, they 
too might be deployed at the tips of filaments adorning the 
cell surface. Several deaminase toxins of plant and animal 
pathogens, such as Xanthomonas oryzae and certain 
Burkholderia species, which belong to the clades typified 
by the Orientia OTTJ508 and Xanthomonas XOO_2897, 
are delivered by this mechanism. A few of these deaminase 
toxins are associated with immunity proteins (e.g. Imm3, 
Imm4, Imm5), suggesting that they are conventional poly- 
morphic toxins might be deployed against closely related 
strains. However, versions like XOO_2897 itself lack 
adjacent immunity proteins suggesting that they might 
be deployed against the plant host, (iii) The third such 
delivery system found across proteobacteria, actino- 
bacteria and firmicutes in the neighborhood of deaminase 
toxins is the Photorhabdus virulence cassette (PVC) 
pathway (80). These toxins entirely lack associated immun- 
ity proteins. Like the T6SS, it uses VgrG and phage base- 
plate related proteins to constitute a delivery channel, but 
differs in utilizing a CDC48-like AAA + superfamily 
ATPase, instead of ClpV, to power export (Table 1; 
Figures 4 and 5). Another distinctive feature of the PVC 
systems, which we discovered, was the presence of a 
metallopeptidase domain immediately N-terminal to the 
toxin domains (Figure 4, Zhang, D., Iyer,L.M. and 
Aravind,L., manuscript in preparation). This is analogous 
to the HINT domain, which we earlier reported as being 
similarly linked to the N-terminal polymorphic toxins to 
provide an autoproteolytic release mechanism (24). Hence, 
we suggest that release of the toxin domains by the PVC 
delivery system might involve an autoproteolytic release 
by the metallopeptidase. Deaminase toxins from chloro- 
flexi, cyanobacteria, fibrobacteria and some gamma- 
proteobacteria are predicted to use such a 
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metallopeptidase for their release (Table 1 and Figure 4). 
Several toxin deaminases are fused to a VgrG-like protein 
or encoded by gene neighborhoods encoding other T6SS 
or PVC system proteins, but lacking an ATPase gene 
(Figure 5 and Table 1). However, in these instances, an 
appropriate AAA + ATPase gene is always encoded at 
distant genomic locations, suggesting that export 
appears to be mediated by this gene product. 
Alternatively, they might represent incomplete cassettes 
that reconstitute a complete export system via 
intragenomic recombination uniting distantly encoded 
components, (iv) The fourth export pathway, which 
mainly appears to be exploited to deliver deaminase 
toxins into host cells by parasites, is the poorly understood 
TcdB/TcaC-like export pathway. The conserved domains 
of this export system are the N-terminal SpvB domain, 
integrin-like (3-propeller repeats and RHS repeats (the 
latter two domains are annotated as 'TcdB Middle N 
domain' and 'TcdB Middle C domain', respectively in 
the Pfam database). This system was previously observed 
in the export of toxins of the eukaryotic parasites such as 
Photorhabdus luminescens (TcdB, TcaC, TccC) (81) and 
Serratia entomophila (SepB, SepC) (82). The phage en- 
coded AID/APOBEC family deaminase toxin, B3gp45, 
of the Wolbachia endosymbiont of Cadre cautella 
( Figure 4) is predicted to deploy this export mechanism. 
The presence of RHS repeats in these toxins suggests that 
they too might be displayed at the tip of filamentous struc- 
tures on the cell surface, (v) Finally, at least one 
deaminase toxin encoded in Clostridium perfringens 
(AC5_0860, gi: 168214630) and several other distinct 
nuclease toxin systems (Zhang, D., Iyer,L.M. and 
Aravind,L., manuscript in preparation) are in a gene 
neighborhood that includes genes homologous to compo- 
nents of the DNA-packaging system of caudate phages 
(83). These primarily include the genes for the large and 
small terminase subunits and capsid. In these instances, it 
is possible that the toxin is packaged into a phage capsid 
and represents a mechanism of toxin transfer analogous to 
phage transduction. 

These observations indicate that the secretory mechan- 
isms are a potential factor in dictating if a given deaminase 
might be deployed as a conventional polymorphic toxin 
against closely related strains or against host cells/distant- 
ly related organisms. However, in both these cases, the 
toxins might display similar structural features, such as 
long N-terminal filamentous elements, with the toxin 
domain presented at the tip. With the exception of the 
PVC secretory systems, certain examples of T6SSs and 
some ESX/T7SS-delivered proteins, all export systems 
traffic toxin proteins with N-terminal filamentous 
regions suggesting that incidental contact with the target 
cell, at some distance from the producing cell-surface, is 
important for toxin deployment once it has been exported. 
This is also supported by the presence of globular 
domains, such as the Lamin G, immunoglobulin and 
the RicinB-like lectin domains, in addition to the 
N-terminal filamentous regions in several deaminase 
toxins (Figures 4 and 5). These might function as 
adhesion modules that help anchor the filaments to the 
producing cell or in enhancing contact with other cells. 



On the other hand, similar toxins using PVC, TcdB/ 
TcaC-like and type VI secretory systems exploit a more 
directed process of injection into target cells. This is par- 
ticularly suitable for pathogenic bacteria and probably 
also for free-living forms against certain environmental 
competitors which they encounter in specific contexts. 

Newly detected deaminases point to a widespread, 
previously unexpected distribution for potential defensive, 
mutagenic and editing functions across eukaryotes. One 
of our key findings is the identification of eukaryotic rep- 
resentatives from most of the novel clades defined by the 
bacterial toxin deaminases (Table 1; Figures 2 and 4). 
However, both the available experimental evidence and 
their domain architectures suggest that most of these eu- 
karyotic cognates are unlikely to function as secreted 
toxins. Nevertheless, the counter-viral action that has 
been demonstrated for members of the APOBEC clade 
(8,84) and the vertebrate-specific ADAR (85) is reminis- 
cent of the nucleic acid-targeting action of the bacterial 
toxins — in a sense they might be considered defensive 
anti-viral toxins. Like their bacterial counterparts, the 
newly detected members of the AID/APOBEC clade are 
remarkable for their sporadic phyletic distribution and 
extreme divergence (Figure 3). They are currently only 
known from the sea anemone Nematostella (three 
paralogs; e.g. NEMVEDRAFT_vlg248558), nematodes 
including C. elegans (e.g. ZK287.1), Micromonas and 
Emilicmia. The nematode versions further contain a 
N-terminal module with eight CXC motifs, a previously 
characterized DNA-binding module found in several eu- 
karyotic chromatin proteins (86). The strongly divergent 
eukaryotic representatives of the OTT_1508-like clade are 
similarly sporadic in their distribution and were detected 
in the moss Selaginella (SELMODRAFT_427619) and the 
early-branching metazoan Trichoplax. The extreme diver- 
gence and sporadic distribution of these eukaryotic 
deaminases suggest that they might be under selective pres- 
sure for diversification and prone to gene loss or lateral 
transfer. This supports their being involved in a defensive 
function against viruses that are also rapidly evolving to 
evade host defenses. They might also operate on selfish 
elements as suggested by the editing of transcripts derived 
from repetitive and selfish elements such as Alu in humans 
and other vertebrates (16,87). Alternatively, rather than 
directly mutating the pathogenic nucleic acids, they 
might help in generating variability in an endogenous de- 
fensive molecule via hypermutation to help it recognize 
diversifying parasites moieties (as has been proposed for 
AID and its relatives in generating variability of vertebrate 
lymphocyte molecules) (17,88,89). Maintenance of these 
mutagenic proteins might have also been favored by re- 
cruitment to certain endogenous cellular functions that 
might not be mutually exclusive from their defensive roles. 
It is conceivable that such editing activities of deaminases 
on certain nuclear gene transcripts and short non-coding 
RNAs favored their fixation because of some selective ad- 
vantage provide by the edited product (e.g. miRNA pre- 
cursors and apolipoprotein B transcript). Thus, the 
Nematostella NEMVEDRAFT_vlg248558-like deamin- 
ases may be involved in editing various miRNAs or 
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piRNAs that have been found in this species (90). While 
miRNA editing has also been observed in nematodes, the 
presence of a N-temiinal DNA-binding domain in these 
proteins favors a role in mutagenizing DNA, perhaps 
comparable with certain vertebrate AID/APOBEC 
family members. 

Mitochondrial and chloroplast genomes are prone to 
accumulation of potentially deleterious mutations due to 
reduced recombination, depleted DNA repair mechanisms 
and reduced effective population size (91,92). In this con- 
text, mutagenic deaminases could offer potential error 
correction mechanisms, against the widespread forces of 
organellar genome mutation, by editing mRNAs to 
restore terminated ORFs or missense codons. Such re- 
cruitment for organellar RNA editing is consistent with 
what has been experimentally observed for certain repre- 
sentatives of the DYW-like clade in land plants, which 
display about 400-500 C to U editing events in mitochon- 
drial mRNAs and 35-40 events in the chloroplast 
(6,61,93). Likewise, the lineage-specific expansions of the 
DYW clade in Naegleria and the versions in the 
mushroom Laccaria and rotifers could also be involved 
in mitochondrial RNA editing, which might be distinct 
from the mitochondrial mRNA editing characterized in 
the kinetoplastids, which restore ORFs using guide 
RNAs and multiple nucleic acid processing enzymes to 
catalyze insertions or deletions (7). These proteins are 
typified by considerable variability in their N-terminal 
RNA-binding PPR repeats (Figure 5), suggesting that 
these play a role in recognition of diverse RNA sequences. 
The DYW deaminases from ascomycete fungi and oomy- 
cetes (which are stramenopiles) represent independent 
transfers from bacteria. The former are fused to ankyrin 
repeats instead of the PPR repeats — it would be of interest 
to investigate if these versions might have parallely 
acquired organellar mRNA editing capability. Another 
aspect of organellar genomes, which could favor recruit- 
ment of these deaminases, is the use of alternative genetic 
codes. In Leishmania tarentolae, an editing event has been 
shown to catalyze a C to U deamination in the anti-codon 
tRNA Trp that is associated with the use of an alternative 
genetic code (94). In this study, we uncovered a 
Leishmania-specific deaminase, prototyped by 
Leishmania major LmjF33.1760 belonging to the clade 
typified by OTT_1508, with orthologs conserved across 
other Leishmania species. Given that it belongs to the 
second great division of deaminases (Table 1), which 
includes the Tadl, Tad3 and Tad2 tRNA deaminases, we 
predict that it might be a tRNA editing deaminase that 
could catalyze modifications, such as that mentioned 
above. The alveolate parasite Perkinsus marinus encodes 
two apparently inactive deaminases belonging to the 
clade typified by the Burkholderia BURPS668 1 122 
protein. Their predicted N-terminal transit peptides 
suggest a potential organellar function; however, as they 
are predicted to be inactive, they might probably merely 
function as regulatory RNA-binding proteins. 

We also recovered at least five other groups of novel 
deaminases that might be involved in editing of tRNAs, 
small non-coding RNAs, nuclear transcripts or organellar 
mRNA. The first of these is the unusual Tad3 of 



basidiomycete fungi, which is fused to a N-terminal SET 
domain (e.g. Cryptococcus CNBC2910 gi: 134109371, 
Figure 4). Tad3 typically functions as a catalytically 
inactive subunit of Tad2 in wobble base editing, while all 
characterized SET domains are protein lysine 
methyltransferases (95). This unusual fusion suggests that 
the SET domain might be involved in methylation of 
RNA-editing proteins. Alternatively, it might be involved 
in the synthesis of an as yet unrecognized modified RNA 
base at the wobble or a proximal position, which contains 
an aliphatic amine moiety similar to lysine. We also re- 
covered members of the OTT_1508 clade in the 
apicomplexans Toxoplasma and Neospora (e.g. gi: 
237838551), where they are fused to the tRNA 
transglycosylase (Figure 4), which is involved in replacing 
guanine at the wobble position with queuine in tRNA Asp , 
tRNA Asn , tRNA His and tRNA Tyr (96,97). This suggests 
that they might catalyze a lineage-specific tRNA deamin- 
ation, possibly at the wobble position. Chlorophyte algae 
and the bacterium Bdellovibrio possess deaminases (e.g. 
MICPUN_102230 and Bd3614), which define a distinct 
branch of the bacterial TadA clade (Figure 3). These 
deaminases are typified by a distinct N-terminal globular 
domain and in Bdellovibrio, it occurs in a predicted operon 
with a 23S rRNA G2445-modifying methylase (Figure 5). 
This suggests that it might mediate a RNA editing event 
distinct from the tRNA modification catalyzed by TadA. 
Another group of deaminases, which represent a distinct 
branch of the CDD/CDA-like clade (e.g. Leishmania 
LmjF36.5940) are widely distributed across several micro- 
bial eukaryotes namely kinetoplastids, chlorophyte algae, 
stramenopiles and the alveolate Perkinsus. The kineto- 
plastid versions are fused to two N-terminal CCCH 
Zn-finger domains and also contain an insert of a distinct 
Zn-chelating domain within the deaminase domain (Figure 
4). The chlorophyte and stramenopile versions have an 
uncharacterized N-terminal Rossmann-fold domain, 
whereas the Perkinsus version has a C-terminal 
Ub-binding Zn-ribbon domain. Given the role of the 
CCCH Zn fingers in binding single-stranded nucleic acids 
(98), it is possible that these proteins might possess mRNA 
editing or DNA mutagenizing activity. A final group of 
potential RNA-editing deaminases constitute yet another 
novel branch of the CDD/CDA-like clade and are re- 
stricted to stramenopiles. These proteins are characterized 
by a N-terminal deaminase domain followed by a SAM 
domain and 1-22 tudor domains (Figure 4). We observed 
that additional proteins with large tandem arrays of related 
tudor domains are a distinctive feature of stramenopiles. 
Proteins with tandem arrays of tudor domains have previ- 
ously been implicated in assembly of RNA complexes 
involved in certain arms of the RNAi system probably 
via recognition of dimethylated arginines that are 
enriched in various RNA-binding proteins (99,100). It 
has also been observed that certain tudor domain 
proteins regulate the A to I editing of microRNAs in 
animals (101). In light of these observations, it is conceiv- 
able that these tudor domain proteins assemble a RNP 
complex in which these deaminase domains edit short 
non-coding RNAs that are part of a stramenopile-specific 
branch of the RNAi system. 
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A remarkable group of fungal deaminases belonging to 
the clade typified by the Orientia OTT_1508 are distin- 
guished by a distinct a+(3 domain that is usually 
N-terminal to the deaminase domain (Figure 4). They 
are likely to have been present in the ancestral fungus, 
as indicated by their presence in the chytrids, basidio- 
mycetes and ascomycetes, though they have been primar- 
ily retained in filamentous forms. In several fungi, they 
display lineage-specific expansions (e.g. up to 16 copies 
in Laccaria bicolor) and the deaminase domains are 
characterized by extreme divergence, suggesting that 
they are under diversifying selective pressure. A subset 
of them is fused to domains suggestive of a 
chromatin-related role, e.g. to a MYND finger and a 
SIR2-like deacetylase (e.g. Magnaporthe MGG_12698; 
gi: 145610470). These architectures suggest that they 
perhaps translocate to specific chromatin regions and 
might have a DNA mutagenizing role directed against 
selfish elements or, additionally in the case of pathogenic 
fungi, highly variable effector genes in the genome. 
Indeed, in certain fungi such mutagenic functions (e.g. 
repeat-induced point mutation) are well-known, and 
might involve the role of a deaminase (102,103). 
However, the notable expansions of these deaminases 
observed in several free-living filamentous fungi, such as 
mushrooms, point to other possibilities. On account of the 
anastomosing growth of their hyphae, filamentous fungi 
are particularly prone to invasion by parasitic nuclei, hi- 
jacking of a colony by non-self conidia germinating on 
hyphae, and cytoplasmic selfish elements, such as 
mycoviruses and senescence plasmids (104). Thus, analo- 
gous to the potential role of the bacterial deaminase toxins 
in non-self discrimination, we propose that the fungal 
deaminase might also provide a line of defense against 
the negative effects of heterokaryon formation. As in the 
case of the well-characterized heterkaryon incompatibility 
loci (105), these deaminases could cause local cell death of 
the heterokaryon by a mutagenic process. Consistent with 
this, our studies also suggest that the catalytic hydrolase 
domain of HetC heterokaryon incompatibility protein has 
been derived from a toxin domain found in bacterial poly- 
morphic toxin systems (Zhang, D., Iyer,L.M. and 
Aravind,L., unpublished datamanuscript in preparation). 
Alternatively, these deaminases could be primarily 
directed against the infectious cytoplasmic agents or 
even defective/selfish organelles that are acquired both 
during heterokaryon formation and sexual cell fusion. A 
similar role is also conceivable for the mushroom versions 
of the YwqJ-like clade that display far fewer paralogs than 
those of the OTT_1508-like clade. 

Evolutionary implications and general conclusions 

Our analysis has considerably clarified the deep evolution- 
ary history of the deaminase-like fold. In particular, it sug- 
gests novel activities for the JAB domain, independent of 
their role as deubiquitinating peptidases. This analysis 
also points to a novel class of regulatory ADP- 
ribosylating/NAD-binding activities typified by the 
TM1506-like proteins. The higher order classification of 
the deaminase-like fold, in conjunction with phyletic 



patterns, suggests that the deaminase superfamily arose 
early in bacteria, followed by an ancient split to give rise 
to the two major divisions (Figure 2). Of these, the CDD/ 
CDA-like cytidine deaminase clade are the only pan- 
bacterial deaminases in the C-terminal hairpin division, 
while the Helix-4 division contains three clades, the 
dCMP, Tad2/TadA-like and riboflavin biosynthesis 
RibD deaminases, which are widely present across most 
major bacterial lineages. This suggests that the ancestral 
deaminase domain probably participated in conversion of 
cytosine to uracil (in nucleosides or nucleotides) in the con- 
text of nucleotide metabolism. Following the early split, 
members of the second division in particular, appear to 
have expanded in their functional capabilities acquiring 
further base and cofactor modification capabilities and a 
role in tRNA modification. In bacteria and archaea, most 
of the C-ending codons are read by anti-codons contain- 
ing a G at position 34 (106), suggesting that this was the 
ancestral condition. The emergence of tRNA anti-codon 
editing deaminase TadA in bacteria appears to have 
allowed the use of A for the first time at this position 
followed by its editing to I in the tRNA Arg (106). TadA 
was acquired by the eukaryotes from a bacterium, most 
probably the endosymbiotic mitochondrial progenitor, 
followed by its duplication into the eukaryote-specific 
paralogs Tad2 and Tad3. This appears to have triggered 
the displacement of the ancestral G at position 34 by A 
(edited to I), not just in the arginine codon, but also those 
for isoleucine, alanine, leucine, proline, valine, serine and 
threonine (106). Thus, the TadA family acquired from 
bacteria early in eukaryotic evolution appears to have 
played a pivotal role in the differentiating the eukaryotic 
system of decoding the genetic code from the ancestral 
state acquired from their archaeal precursor. In contrast, 
archaea seem to have relatively infrequently acquired 
members of the deaminase superfamily from bacteria 
(Supplementary Data). One notable case is the 
Methanopyrus CDAT8, which appears to have emerged 
from the lateral transfer of a bacterial deoxycytidylate 
deaminase followed by fusion to the RNA-binding 
THUMP domain (107), resulting an independent origin 
of a tRNA editing enzyme. 

Emergence of the bacterial toxin systems that were 
either directed at closely related competitors or distantly 
related cells offered a fertile recruiting ground for enzymes 
operating on nucleic acids. This resulted in a further wave 
of diversification of the deaminase superfamily, with toxin 
deaminases being recruited from both the great divisions 
of the deaminase superfamily and combined with several 
distinct mechanisms for secretion and presentation. A 
notable finding from our study is the detection of such 
deaminase toxin domains in secreted toxins of several bac- 
terial symbionts and parasites of eukaryotes, including 
endosymbionts/endoparasites. This indicates that muta- 
genesis and editing of host RNAs might be a previously 
unknown mechanism by which host behavior is con- 
trolled. Strikingly, the relationship of these bacterial 
toxin deaminases to several clades of rapidly evolving and 
sporadically distributed eukaryotic deaminases suggests 
that eukaryotes acquired these molecules, probably via 
lateral gene transfer from their endosymbionts. This 



9494 Nucleic Acids Research, 2011, Vol. 39, No. 22 



provides an explanation for the 'sudden' evolutionary 
provenance and patchy distribution of several deaminase 
clades such as the AID/APOBEC clade and the DYW 
clade. The former was most probably acquired from an 
endosymbiont version that resembled the Wolbachia 
phage encoded AID/APOBEC-like deaminase and the 
latter from a version resembling that found in bacterial 
polymorphic toxins. The newly extended phyletic pattern 
of AID/APOBEC-like deaminases, with representatives in 
basal metazoans (e.g. Nematostella), nematodes and dis- 
tantly related algal lineages, along with their previously 
known presence in vertebrates, point to a complex evolu- 
tionary history for these proteins in eukaryotes. The 
non-vertebrate eukaryotic versions and those from algae 
share a large insert between the two metal-chelating cyst- 
eines in addition to some other sequence features (Figure 3 
and Supplementary Data). Further, the Nematostella and 
algal versions share several additional features (Figure 4, 
see above). Certain specific sequence features uniting all 
eukaryotic AID/APOBEC-like deaminases (Figure 3) 
suggest that the most parsimonious scenario is a single 
introduction of these enzymes to eukaryotes from bacteria 
with a further history of intraeukaryotic transfers along 
with multiple gene losses. However, the extreme sequence 
divergence of these domains hampers testing to these scen- 
arios through phylogenetic analysis. Right in the common 
ancestor of the jawed and jawless vertebrates, AID/ 
APOBEC-like deaminases appear to have split into two 
primary branches — APOBEC4-like and the AID-like 
clades. The former acquired a distinctive N -terminal 
Zn-chelating domain with 2 cysteines and histidine and 
the fourth cysteine being supplied from within the core 
deaminase domain (between strand-2 and helix-2), which 
is likely to form a distinct nucleic-acid-binding interface 
(Supplementary Data). In jawless vertebrates, the AID- 
like branch spawned two mutagenic deaminases 
(PmCDAl and PmCDA2) involved in diversification of 
their variable lymphocyte receptors. In course of the evo- 
lution of jawed vertebrates, the AID-like branch further 
diversified giving rise to AID itself and APOBEC2 (at the 
base of jawed vertebrates), APOBEC3 (in tetrapods) and 
APOBEC1 (in mammals). Evidence from the lamprey sug- 
gests that the common ancestor of the AID-like branch 
had already acquired a role in mutagenic diversification of 
immunity receptors (17). This function appears to have 
persisted through vertebrate evolution despite the acquisi- 
tion of unrelated immunity receptors by the jawed and 
jawless vertebrates. 

In conclusion, our finding of multiple eukaryotic 
deaminases associated with distinct clades of bacterial 
toxin deaminases strongly argues for multiple acquisitions 
of such mutagenic/RNA-editing deaminases by eukary- 
otes (Table 1 and Figure 2). Given the mutagenic potential 
of these deaminases, their dispersion via toxin systems 
could possibly make them mobile agents of 'evolvability' 
that are gained and lost by organisms. This possibility is of 
particular interest in light of recent studies that are 
bringing to light considerable differences between the se- 
quences of the genome and transcriptome of nuclear genes 
with alterations to the coding capacity (108). On a more 
general note, these deaminases represent just one of 



several instances of domains from bacterial toxin systems 
being captured and exapted by eukaryotes for their own 
regulatory or defensive functions. We had earlier shown 
that the EndoU RNAse deployed in eukaryotic small nu- 
cleolar RNA processing (109) has a similar origin from a 
toxin domain of bacterial polymorphic toxin systems (24). 
At least two components of the eukaryotic Hedgehog sig- 
naling pathway, namely the HINT domain and the SUFU 
domain have been respectively acquired from an auto- 
proteolytic peptidase and immunity protein of the bacter- 
ial toxin systems. Similarly, the SUKH immunity protein 
from such systems has been widely used by both eukary- 
otes and their viruses as a versatile protein-protein inter- 
action scaffold (24). In light of this, it is tempting to suggest 
that the sudden emergence of the divergent deaminase 
domain of the tRNA Aa position 37 editing Tadl protein 
at the base of the eukaryotic tree might represent an early 
example of a toxin deaminase being captured from a bac- 
terial symbiont, prior to the last eukaryotic common 
ancestor. These observations underscore the potential im- 
portance of the widespread bacterial symbiosis in 
providing raw material for eukaryotic innovations, 
including key developmental pathways and adaptive 
immunity. In conclusion, the above results offer multiple 
testable hypotheses regarding the activities of deaminases 
and more generally, other members of the deaminase-like 
fold, such as the JAB domain. We hope that further studies 
on the molecules uncovered in this study lead to a better 
understanding of the biochemistry of deaminases in the 
context of previously unknown RNA editing and mutagen- 
esis events, as also their biological roles in counter-selfish 
element defense, erasure of epigenetic DNA modifications, 
diversification of immunity molecules, organellar gene ex- 
pression and self versus non-self discrimination. 
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